WorldWideScience

Sample records for regression method pls

  1. COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

    Directory of Open Access Journals (Sweden)

    ELİF BULUT

    2013-06-01

    Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.

  2. Variable and subset selection in PLS regression

    DEFF Research Database (Denmark)

    Høskuldsson, Agnar

    2001-01-01

    The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...

  3. Variable selection methods in PLS regression - a comparison study on metabolomics data

    DEFF Research Database (Denmark)

    Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach

    . The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when...... integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...

  4. Application of NIRS coupled with PLS regression as a rapid, non-destructive alternative method for quantification of KBA in Boswellia sacra

    Science.gov (United States)

    Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif

    2017-09-01

    In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500 nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1 ppm to 100 ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation.

  5. The feasibility of using explicit method for linear correction of the particle size variation using NIR Spectroscopy combined with PLS2regression method

    Science.gov (United States)

    Yulia, M.; Suhandy, D.

    2018-03-01

    NIR spectra obtained from spectral data acquisition system contains both chemical information of samples as well as physical information of the samples, such as particle size and bulk density. Several methods have been established for developing calibration models that can compensate for sample physical information variations. One common approach is to include physical information variation in the calibration model both explicitly and implicitly. The objective of this study was to evaluate the feasibility of using explicit method to compensate the influence of different particle size of coffee powder in NIR calibration model performance. A number of 220 coffee powder samples with two different types of coffee (civet and non-civet) and two different particle sizes (212 and 500 µm) were prepared. Spectral data was acquired using NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement. A discrimination method based on PLS-DA was conducted and the influence of different particle size on the performance of PLS-DA was investigated. In explicit method, we add directly the particle size as predicted variable results in an X block containing only the NIR spectra and a Y block containing the particle size and type of coffee. The explicit inclusion of the particle size into the calibration model is expected to improve the accuracy of type of coffee determination. The result shows that using explicit method the quality of the developed calibration model for type of coffee determination is a little bit superior with coefficient of determination (R2) = 0.99 and root mean square error of cross-validation (RMSECV) = 0.041. The performance of the PLS2 calibration model for type of coffee determination with particle size compensation was quite good and able to predict the type of coffee in two different particle sizes with relatively high R2 pred values. The prediction also resulted in low bias and RMSEP values.

  6. Kinetic microplate bioassays for relative potency of antibiotics improved by partial Least Square (PLS) regression.

    Science.gov (United States)

    Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello

    2016-05-01

    Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. The effect of PLS regression in PLS path model estimation when multicollinearity is present

    DEFF Research Database (Denmark)

    Nielsen, Rikke; Kristensen, Kai; Eskildsen, Jacob

    PLS path modelling has previously been found to be robust to multicollinearity both between latent variables and between manifest variables of a common latent variable (see e.g. Cassel et al. (1999), Kristensen, Eskildsen (2005), Westlund et al. (2008)). However, most of the studies investigate...... models with relatively few variables and very simple dependence structures compared to the models that are often estimated in practical settings. A recent study by Nielsen et al. (2009) found that when model structure is more complex, PLS path modelling is not as robust to multicollinearity between...... latent variables as previously assumed. A difference in the standard error of path coefficients of as much as 83% was found between moderate and severe levels of multicollinearity. Large differences were found not only for large path coefficients, but also for small path coefficients and in some cases...

  8. The Chaotic Prediction for Aero-Engine Performance Parameters Based on Nonlinear PLS Regression

    Directory of Open Access Journals (Sweden)

    Chunxiao Zhang

    2012-01-01

    Full Text Available The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature (EGT time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square (PLS based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares (OLS method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.

  9. Analysis of designed experiments by stabilised PLS Regression and jack-knifing

    DEFF Research Database (Denmark)

    Martens, Harald; Høy, M.; Westad, F.

    2001-01-01

    Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi....... An Introduction, Wiley, Chichester, UK, 2001]....

  10. Enhanced Anomaly Detection Via PLS Regression Models and Information Entropy Theory

    KAUST Repository

    Harrou, Fouzi

    2015-12-07

    Accurate and effective fault detection and diagnosis of modern engineering systems is crucial for ensuring reliability, safety and maintaining the desired product quality. In this work, we propose an innovative method for detecting small faults in the highly correlated multivariate data. The developed method utilizes partial least square (PLS) method as a modelling framework, and the symmetrized Kullback-Leibler divergence (KLD) as a monitoring index, where it is used to quantify the dissimilarity between probability distributions of current PLS-based residual and reference one obtained using fault-free data. The performance of the PLS-based KLD fault detection algorithm is illustrated and compared to the conventional PLS-based fault detection methods. Using synthetic data, we have demonstrated the greater sensitivity and effectiveness of the developed method over the conventional methods, especially when data are highly correlated and small faults are of interest.

  11. Enhanced Anomaly Detection Via PLS Regression Models and Information Entropy Theory

    KAUST Repository

    Harrou, Fouzi; Sun, Ying

    2015-01-01

    Accurate and effective fault detection and diagnosis of modern engineering systems is crucial for ensuring reliability, safety and maintaining the desired product quality. In this work, we propose an innovative method for detecting small faults in the highly correlated multivariate data. The developed method utilizes partial least square (PLS) method as a modelling framework, and the symmetrized Kullback-Leibler divergence (KLD) as a monitoring index, where it is used to quantify the dissimilarity between probability distributions of current PLS-based residual and reference one obtained using fault-free data. The performance of the PLS-based KLD fault detection algorithm is illustrated and compared to the conventional PLS-based fault detection methods. Using synthetic data, we have demonstrated the greater sensitivity and effectiveness of the developed method over the conventional methods, especially when data are highly correlated and small faults are of interest.

  12. Determination of fat content in chicken hamburgers using NIR spectroscopy and the Successive Projections Algorithm for interval selection in PLS regression (iSPA-PLS)

    Science.gov (United States)

    Krepper, Gabriela; Romeo, Florencia; Fernandes, David Douglas de Sousa; Diniz, Paulo Henrique Gonçalves Dias; de Araújo, Mário César Ugulino; Di Nezio, María Susana; Pistonesi, Marcelo Fabián; Centurión, María Eugenia

    2018-01-01

    Determining fat content in hamburgers is very important to minimize or control the negative effects of fat on human health, effects such as cardiovascular diseases and obesity, which are caused by the high consumption of saturated fatty acids and cholesterol. This study proposed an alternative analytical method based on Near Infrared Spectroscopy (NIR) and Successive Projections Algorithm for interval selection in Partial Least Squares regression (iSPA-PLS) for fat content determination in commercial chicken hamburgers. For this, 70 hamburger samples with a fat content ranging from 14.27 to 32.12 mg kg- 1 were prepared based on the upper limit recommended by the Argentinean Food Codex, which is 20% (w w- 1). NIR spectra were then recorded and then preprocessed by applying different approaches: base line correction, SNV, MSC, and Savitzky-Golay smoothing. For comparison, full-spectrum PLS and the Interval PLS are also used. The best performance for the prediction set was obtained for the first derivative Savitzky-Golay smoothing with a second-order polynomial and window size of 19 points, achieving a coefficient of correlation of 0.94, RMSEP of 1.59 mg kg- 1, REP of 7.69% and RPD of 3.02. The proposed methodology represents an excellent alternative to the conventional Soxhlet extraction method, since waste generation is avoided, yet without the use of either chemical reagents or solvents, which follows the primary principles of Green Chemistry. The new method was successfully applied to chicken hamburger analysis, and the results agreed with those with reference values at a 95% confidence level, making it very attractive for routine analysis.

  13. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional feature...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....

  14. Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

    Science.gov (United States)

    Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

    2017-12-01

    The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.

  15. Application of sequential and orthogonalised-partial least squares (SO-PLS) regression to predict sensory properties of Cabernet Sauvignon wines from grape chemical composition.

    Science.gov (United States)

    Niimi, Jun; Tomic, Oliver; Næs, Tormod; Jeffery, David W; Bastian, Susan E P; Boss, Paul K

    2018-08-01

    The current study determined the applicability of sequential and orthogonalised-partial least squares (SO-PLS) regression to relate Cabernet Sauvignon grape chemical composition to the sensory perception of the corresponding wines. Grape samples (n = 25) were harvested at a similar maturity and vinified identically in 2013. Twelve measures using various (bio)chemical methods were made on grapes. Wines were evaluated using descriptive analysis with a trained panel (n = 10) for sensory profiling. Data was analysed globally using SO-PLS for the entire sensory profiles (SO-PLS2), as well as for single sensory attributes (SO-PLS1). SO-PLS1 models were superior in validated explained variances than SO-PLS2. SO-PLS provided a structured approach in the selection of predictor chemical data sets that best contributed to the correlation of important sensory attributes. This new approach presents great potential for application in other explorative metabolomics studies of food and beverages to address factors such as quality and regional influences. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Assessment of bitter taste of pharmaceuticals with multisensor system employing 3 way PLS regression

    International Nuclear Information System (INIS)

    Rudnitskaya, Alisa; Kirsanov, Dmitry; Blinova, Yulia; Legin, Evgeny; Seleznev, Boris; Clapham, David; Ives, Robert S.; Saunders, Kenneth A.; Legin, Andrey

    2013-01-01

    Highlights: ► Chemically diverse APIs are studied with potentiometric “electronic tongue”. ► Bitter taste of APIs can be predicted with 3wayPLS regression from ET data. ► High correlation of ET assessment with human panel and rat in vivo model. -- Abstract: The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model

  17. Assessment of bitter taste of pharmaceuticals with multisensor system employing 3 way PLS regression

    Energy Technology Data Exchange (ETDEWEB)

    Rudnitskaya, Alisa [CESAM and Chemistry Department, University of Aveiro, Aveiro (Portugal); Kirsanov, Dmitry, E-mail: d.kirsanov@gmail.com [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Blinova, Yulia [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Legin, Evgeny [Sensor Systems LLC, St. Petersburg (Russian Federation); Seleznev, Boris [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Clapham, David; Ives, Robert S.; Saunders, Kenneth A. [GlaxoSmithKline Pharmaceuticals, Gunnels Wood Road, Stevenage (United Kingdom); Legin, Andrey [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation)

    2013-04-03

    Highlights: ► Chemically diverse APIs are studied with potentiometric “electronic tongue”. ► Bitter taste of APIs can be predicted with 3wayPLS regression from ET data. ► High correlation of ET assessment with human panel and rat in vivo model. -- Abstract: The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model.

  18. Senate Bill (PLS No. 200, de 2015, analysis versus the Principle of the Prohibition of Social Regression

    Directory of Open Access Journals (Sweden)

    Glaucia Ribeiro Lima

    2016-12-01

    Full Text Available The Senate Bill (PLS number 200, of 2015, proposes the edition of a law for the conduction of clinical trials involving human subjects. This study aimed to perform a critical analysis of the PLS 200/2015, based on the Principle of the Prohibition of Social Regression. Thus, a descriptive, documentary and normative research was conducted, with survey of the ethical and sanitary standards related to clinical research and findings related to the PL 200/2015. The PLS 200/2015 and the information regarding was also consulted on the website of the Senate. The regulation of the matter by law demonstrated not to be a problem in the research. The main conflicts were related to the creation of Independent Ethics Committee (IEC, that does not link the ethic review to an State Agency; the use of placebo, in which flexibility is contrary to all efforts to ensure that participants have the best treatment options; and post-study access, which restriction is contrary to the existing regulations that determine the free and unlimited access. The analysis of the main settings specified in the PLS 200/2015 did not identify social or scientific improvements. The Principle of the Prohibition of Social Regression can be used, thus, to ensure the constitutional provisions already undertake and accomplished, mainly the right to health, human dignity and the inviolability of the right to live.

  19. Comparison of FTIR-ATR and Raman spectroscopy in determination of VLDL triglycerides in blood serum with PLS regression

    Science.gov (United States)

    Oleszko, Adam; Hartwich, Jadwiga; Wójtowicz, Anna; Gąsior-Głogowska, Marlena; Huras, Hubert; Komorowska, Małgorzata

    2017-08-01

    Hypertriglyceridemia, related with triglyceride (TG) in plasma above 1.7 mmol/L is one of the cardiovascular risk factors. Very low density lipoproteins (VLDL) are the main TG carriers. Despite being time consuming, demanding well-qualified staff and expensive instrumentation, ultracentrifugation technique still remains the gold standard for the VLDL isolation. Therefore faster and simpler method of VLDL-TG determination is needed. Vibrational spectroscopy, including FT-IR and Raman, is widely used technique in lipid and protein research. The aim of this study was assessment of Raman and FT-IR spectroscopy in determination of VLDL-TG directly in serum with the isolation step omitted. TG concentration in serum and in ultracentrifugated VLDL fractions from 32 patients were measured with reference colorimetric method. FT-IR and Raman spectra of VLDL and serum samples were acquired. Partial least square (PLS) regression was used for calibration and leave-one-out cross validation. Our results confirmed possibility of reagent-free determination of VLDL-TG directly in serum with both Raman and FT-IR spectroscopy. Quantitative VLDL testing by FT-IR and/or Raman spectroscopy applied directly to maternal serum seems to be promising screening test to identify women with increased risk of adverse pregnancy outcomes and patient friendly method of choice based on ease of performance, accuracy and efficiency.

  20. PLS2 regression as a tool for selection of optimal analytical modality

    DEFF Research Database (Denmark)

    Madsen, Michael; Esbensen, Kim

    Intelligent use of modern process analysers allows process technicians and engineers to look deep into the dynamic behaviour of production systems. This opens up for a plurality of new possibilities with respect to process optimisation. Oftentimes, several instruments representing different...... technologies and price classes are able to decipher relevant process information simultaneously. The question then is: how to choose between available technologies without compromising the quality and usability of the data. We apply PLS2 modelling to quantify the relative merits of competing, or complementing......, analytical modalities. We here present results from a feasibility study, where Fourier Transform Near InfraRed (FT-NIR), Fourier Transform Mid InfraRed (FT-MIR), and Raman laser spectroscopy were applied on the same set of samples obtained from a pilot-scale beer brewing process. Quantitative PLS1 models...

  1. Alternative Methods of Regression

    CERN Document Server

    Birkes, David

    2011-01-01

    Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s

  2. AO–MW–PLS method applied to rapid quantification of teicoplanin with near-infrared spectroscopy

    Directory of Open Access Journals (Sweden)

    Jiemei Chen

    2017-01-01

    Full Text Available Teicoplanin (TCP is an important lipoglycopeptide antibiotic produced by fermenting Actinoplanes teichomyceticus. The change in TCP concentration is important to measure in the fermentation process. In this study, a reagent-free and rapid quantification method for TCP in the TCP–Tris–HCl mixture samples was developed using near-infrared (NIR spectroscopy by focusing our attention on the fermentation process for TCP. The absorbance optimization (AO partial least squares (PLS was proposed and integrated with the moving window (MW PLS, which is called AO–MW–PLS method, to select appropriate wavebands. A model set that includes various wavebands that were equivalent to the optimal AO–MW–PLS waveband was proposed based on statistical considerations. The public region of all equivalent wavebands was just one of the equivalent wavebands. The obtained public regions were 1540–1868nm for TCP and 1114–1310nm for Tris. The root-mean-square error and correlation coefficient for leave-one-out cross validation were 0.046mg mL−1 and 0.9998mg mL−1 for TCP, and 0.235mg mL−1 and 0.9986mg mL−1 for Tris, respectively. All the models achieved highly accurate prediction effects, and the selected wavebands provided valuable references for designing specialized spectrometers. This study provided a valuable reference for further application of the proposed methods to TCP fermentation broth and to other spectroscopic analysis fields.

  3. A PLS-based extractive spectrophotometric method for simultaneous determination of carbamazepine and carbamazepine-10,11-epoxide in plasma and comparison with HPLC

    Science.gov (United States)

    Hemmateenejad, Bahram; Rezaei, Zahra; Khabnadideh, Soghra; Saffari, Maryam

    2007-11-01

    Carbamazepine (CBZ) undergoes enzyme biotransformation through epoxidation with the formation of its metabolite, carbamazepine-10,11-epoxide (CBZE). A simple chemometrics-assisted spectrophotometric method has been proposed for simultaneous determination of CBZ and CBZE in plasma. A liquid extraction procedure was operated to separate the analytes from plasma, and the UV absorbance spectra of the resultant solutions were subjected to partial least squares (PLS) regression. The optimum number of PLS latent variables was selected according to the PRESS values of leave-one-out cross-validation. A HPLC method was also employed for comparison. The respective mean recoveries for analysis of CBZ and CBZE in synthetic mixtures were 102.57 (±0.25)% and 103.00 (±0.09)% for PLS and 99.40 (±0.15)% and 102.20 (±0.02)%. The concentrations of CBZ and CBZE were also determined in five patients using the PLS and HPLC methods. The results showed that the data obtained by PLS were comparable with those obtained by HPLC method.

  4. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  5. Comparing the analytical performances of Micro-NIR and FT-NIR spectrometers in the evaluation of acerola fruit quality, using PLS and SVM regression algorithms.

    Science.gov (United States)

    Malegori, Cristina; Nascimento Marques, Emanuel José; de Freitas, Sergio Tonetto; Pimentel, Maria Fernanda; Pasquini, Celio; Casiraghi, Ernestina

    2017-04-01

    The main goal of this study was to investigate the analytical performances of a state-of-the-art device, one of the smallest dispersion NIR spectrometers on the market (MicroNIR 1700), making a critical comparison with a benchtop FT-NIR spectrometer in the evaluation of the prediction accuracy. In particular, the aim of this study was to estimate in a non-destructive manner, titratable acidity and ascorbic acid content in acerola fruit during ripening, in a view of direct applicability in field of this new miniaturised handheld device. Acerola (Malpighia emarginata DC.) is a super-fruit characterised by a considerable amount of ascorbic acid, ranging from 1.0% to 4.5%. However, during ripening, acerola colour changes and the fruit may lose as much as half of its ascorbic acid content. Because the variability of chemical parameters followed a non-strictly linear profile, two different regression algorithms were compared: PLS and SVM. Regression models obtained with Micro-NIR spectra give better results using SVM algorithm, for both ascorbic acid and titratable acidity estimation. FT-NIR data give comparable results using both SVM and PLS algorithms, with lower errors for SVM regression. The prediction ability of the two instruments was statistically compared using the Passing-Bablok regression algorithm; the outcomes are critically discussed together with the regression models, showing the suitability of the portable Micro-NIR for in field monitoring of chemical parameters of interest in acerola fruits. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Determinação simultânea dos teores de cinza e proteína em farinha de trigo empregando NIRR-PLS e DRIFT-PLS Simultaneous determination of ash content and protein in wheat flour using infrared reflection techniques and partial least-squares regression (PLS

    Directory of Open Access Journals (Sweden)

    Marco Flôres Ferrão

    2004-09-01

    Full Text Available As técnicas de espectroscopia por reflexão no infravermelho próximo (NIRRS e por reflexão difusa no infravermelho médio com transformada de Fourier (DRIFTS foram empregadas com o método de regressão multivariado por mínimos quadrados parciais (PLS para a determinação simultânea dos teores de proteína e cinza em amostras de farinha de trigo da variedade Triticum aestivum L. Foram coletados espectros no infravermelho em duplicata de 100 amostras, empregando-se acessórios de reflexão difusa. Os teores de proteína (8,85-13,23% e cinza (0,330-1,287%, empregados como referência, foram determinados pelo método Kjeldhal e método gravimétrico, respectivamente. Os dados espectrais foram utilizados no formato log(1/R, bem como suas derivadas de primeira e segunda ordem, sendo pré-processados usando-se os dados centrados na média (MC ou escalados pela variância (VS ou ambos. Cinqüenta e cinco amostras foram usadas para calibração e 45 para validação dos modelos, adotando-se como critério de construção os valores mínimos do erro padrão de calibração (SEC e do erro padrão de validação (SEV. Estes valores foram inferiores a 0,33% para proteína e a 0,07% para cinza. Os métodos desenvolvidos apresentam como vantagens a não agressão ao ambiente, bem como permitem uma determinação direta, simultânea, rápida e não destrutiva dos teores de proteína e cinza em amostras de farinha de trigo.Partial Least Square (PLS multivariate calibration associated to Near Infrared Reflection Spectroscopy (NIRRS or Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS were used to establish methods for simultaneous determination of protein and ash content on commercial wheat flour samples of Triticum aestivum L. Duplicate spectra of 100 samples with protein content between 8.85-13.23% (Kjeldahl method and ash content between 0.330-1.287% (gravimetric method were employed to build calibration methods. The spectra were used

  7. Regression methods for medical research

    CERN Document Server

    Tai, Bee Choo

    2013-01-01

    Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the

  8. Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression

    DEFF Research Database (Denmark)

    Martens, Harald

    2001-01-01

    Why and how the Partial Least Squares Regression (PLSR) was developed, is here described from the author's perspective. The paper outlines my frustrating experiences in the 70'ies with two conflicting and equally over-ambitious and oversimplified modelling cultures - in traditional chemistry...

  9. Evaluation of in-line Raman data for end-point determination of a coating process: Comparison of Science-Based Calibration, PLS-regression and univariate data analysis.

    Science.gov (United States)

    Barimani, Shirin; Kleinebudde, Peter

    2017-10-01

    A multivariate analysis method, Science-Based Calibration (SBC), was used for the first time for endpoint determination of a tablet coating process using Raman data. Two types of tablet cores, placebo and caffeine cores, received a coating suspension comprising a polyvinyl alcohol-polyethylene glycol graft-copolymer and titanium dioxide to a maximum coating thickness of 80µm. Raman spectroscopy was used as in-line PAT tool. The spectra were acquired every minute and correlated to the amount of applied aqueous coating suspension. SBC was compared to another well-known multivariate analysis method, Partial Least Squares-regression (PLS) and a simpler approach, Univariate Data Analysis (UVDA). All developed calibration models had coefficient of determination values (R 2 ) higher than 0.99. The coating endpoints could be predicted with root mean square errors (RMSEP) less than 3.1% of the applied coating suspensions. Compared to PLS and UVDA, SBC proved to be an alternative multivariate calibration method with high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Data Mining of Chemogenomics Data Using Bi-Modal PLS Methods and Chemical Interpretation for Molecular Design.

    Science.gov (United States)

    Hasegawa, Kiyoshi; Funatsu, Kimito

    2014-12-01

    Chemogenomics is a new strategy in drug discovery for interrogating all molecules capable of interacting with all biological targets. Because of the almost infinite number of drug-like organic molecules, bench-based experimental chemogenomics methods are not generally feasible. Several in silico chemogenomics models have therefore been developed for high-throughput screening of large numbers of drug candidate compounds and target proteins. In previous studies, we described two novel bi-modal PLS approaches. These methods provide a significant advantage in that they enable direct connections to be made between biological activities and ligand and protein descriptors. In this special issue, we review these two PLS-based approaches using two different chemogenomics datasets for illustration. We then compare the predictive and interpretive performance of the two methods using the same congeneric data set. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Determining the Relationship between U.S. County-Level Adult Obesity Rate and Multiple Risk Factors by PLS Regression and SVM Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Chau-Kuang Chen

    2015-02-01

    Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.

  12. Prediction of Caffeine Content in Java Preanger Coffee Beans by NIR Spectroscopy Using PLS and MLR Method

    Science.gov (United States)

    Budiastra, I. W.; Sutrisno; Widyotomo, S.; Ayu, P. C.

    2018-05-01

    Caffeine is one of important components in coffee that contributes to the coffee beverages flavor. Caffeine concentration in coffee bean is usually determined by chemical method which is time consuming and destructive method. A nondestructive method using NIR spectroscopy was successfully applied to determine the caffeine concentration of Arabica gayo coffee bean. In this study, NIR Spectroscopy was assessed to determine the caffeine concentration of java preanger coffee bean. A hundred samples, each consist of 96 g coffee beans were prepared for reflectance and chemical measurement. Reflectance of the sample was measured by FT-NIR spectrometer in the wavelength of 1000-2500 nm (10000-4000 cm-1) followed by determination of caffeine content using LCMS method. Calibration of NIR spectra and the caffeine content was carried out using PLS and MLR methods. Several spectra data processing was conducted to increase the accuracy of prediction. The result of the study showed that caffeine content could be determined by PLS model using 7 factors and spectra data processing of combination of the first derivative and MSC of spectra absorbance (r = 0.946; CV = 1.54 %; RPD = 2.28). A lower accuracy was obtained by MLR model consisted of three caffeine and other four absorption wavelengths (r = 0.683; CV = 3.31%; RPD = 1.18).

  13. Screening method for rapid classification of psychoactive substances in illicit tablets using mid infrared spectroscopy and PLS-DA.

    Science.gov (United States)

    Pereira, Leandro S A; Lisboa, Fernanda L C; Coelho Neto, José; Valladão, Frederico N; Sena, Marcelo M

    2018-05-09

    Several new psychoactive substances (NPS) have reached the illegal drug market in recent years, and ecstasy-like tablets are one of the forms affected by this change. Cathinones and tryptamines have increasingly been found in ecstasy-like seized samples as well as other amphetamine type stimulants. A presumptive method for identifying different drugs in seized ecstasy tablets (n=92) using ATR-FTIR (attenuated total reflectance - Fourier transform infrared spectroscopy) and PLS-DA (partial least squares discriminant analysis) was developed. A hierarchical strategy of sequential modeling was performed with PLS-DA. The main model discriminated four classes: 5-MeO-MIPT, methylenedioxyamphetamines (MDMA and MDA), methamphetamine, and cathinones. Two submodels were built to identify drugs present in MDs and cathinones classes. Models were validated through the estimate of figures of merit. The average reliability rate (RLR) of the main model was 96.8% and accordance (ACC) was 100%. For the submodels, RLR and ACC were 100%. The reliability of the models was corroborated through their spectral interpretation. Thus, spectral assignments were performed by associating informative vectors of each specific modeled class to the respective drugs. The developed method is simple, fast, and can be applied to the forensic laboratory routine, leading to objective results reports useful for forensic scientists and law enforcement. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. An improved partial least-squares regression method for Raman spectroscopy

    Science.gov (United States)

    Momenpour Tehran Monfared, Ali; Anis, Hanan

    2017-10-01

    It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.

  15. PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data

    Directory of Open Access Journals (Sweden)

    Renata Bujak

    2016-07-01

    Full Text Available Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings

  16. Multivariate analysis of nystatin and metronidazole in a semi-solid matrix by means of diffuse reflectance NIR spectroscopy and PLS regression.

    Science.gov (United States)

    Baratieri, Sabrina C; Barbosa, Juliana M; Freitas, Matheus P; Martins, José A

    2006-01-23

    A multivariate method of analysis of nystatin and metronidazole in a semi-solid matrix, based on diffuse reflectance NIR measurements and partial least squares regression, is reported. The product, a vaginal cream used in the antifungal and antibacterial treatment, is usually, quantitatively analyzed through microbiological tests (nystatin) and HPLC technique (metronidazole), according to pharmacopeial procedures. However, near infrared spectroscopy has demonstrated to be a valuable tool for content determination, given the rapidity and scope of the method. In the present study, it was successfully applied in the prediction of nystatin (even in low concentrations, ca. 0.3-0.4%, w/w, which is around 100,000 IU/5g) and metronidazole contents, as demonstrated by some figures of merit, namely linearity, precision (mean and repeatability) and accuracy.

  17. Deflation in multiblock PLS

    NARCIS (Netherlands)

    Westerhuis, J. A.; Smilde, A. K.

    2001-01-01

    This paper describes some of the deflation problems in multiblock PLS. Deflation of X using block scores leads to inferior prediction of Y. Deflation of X using super scores gives the same predictions as standard PLS with all variables in one large X-block, but the information of the separate blocks

  18. Sensory and instrumental texture assessment of roasted pistachio nut/kernel by partial least square (PLS) regression analysis: effect of roasting conditions.

    Science.gov (United States)

    Mohammadi Moghaddam, Toktam; Razavi, Seyed M A; Taghizadeh, Masoud; Sazgarnia, Ameneh

    2016-01-01

    Roasting is an important step in the processing of pistachio nuts. The effect of hot air roasting temperature (90, 120 and 150 °C), time (20, 35 and 50 min) and air velocity (0.5, 1.5 and 2.5 m/s) on textural and sensory characteristics of pistachio nuts and kernels were investigated. The results showed that increasing the roasting temperature decreased the fracture force (82-25.54 N), instrumental hardness (82.76-37.59 N), apparent modulus of elasticity (47-21.22 N/s), compressive energy (280.73-101.18 N.s) and increased amount of bitterness (1-2.5) and the hardness score (6-8.40) of pistachio kernels. Higher roasting time improved the flavor of samples. The results of the consumer test showed that the roasted pistachio kernels have good acceptability for flavor (score 5.83-8.40), color (score 7.20-8.40) and hardness (score 6-8.40) acceptance. Moreover, Partial Least Square (PLS) analysis of instrumental and sensory data provided important information for the correlation of objective and subjective properties. The univariate analysis showed that over 93.87 % of the variation in sensory hardness and almost 87 % of the variation in sensory acceptability could be explained by instrumental texture properties.

  19. Using the partial least squares (PLS) method to establish critical success factor interdependence in ERP implementation projects

    OpenAIRE

    Esteves, José; Pastor Collado, Juan Antonio; Casanovas Garcia, Josep

    2002-01-01

    This technical research report proposes the usage of a statistical approach named Partial Least squares (PLS) to define the relationships between critical success factors for ERP implementation projects. In previous research work, we developed a unified model of critical success factors for ERP implementation projects. Some researchers have evidenced the relationships between these critical success factors, however no one has defined in a form...

  20. Simultaneous chemometric determination of pyridoxine hydrochloride and isoniazid in tablets by multivariate regression methods.

    Science.gov (United States)

    Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru

    2010-08-01

    The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.

  1. PLS-based memory control scheme for enhanced process monitoring

    KAUST Repository

    Harrou, Fouzi; Sun, Ying

    2017-01-01

    Fault detection is important for safe operation of various modern engineering systems. Partial least square (PLS) has been widely used in monitoring highly correlated process variables. Conventional PLS-based methods, nevertheless, often fail

  2. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  3. Stochastic development regression using method of moments

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....

  4. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

    Science.gov (United States)

    Delwiche, Stephen R; Reeves, James B

    2010-01-01

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various

  5. Control Point Generated PLS - lines

    Data.gov (United States)

    Minnesota Department of Natural Resources — The Control Point Generated PLS layer contains line and polygon features to the 1/4 of 1/4 PLS section (approximately 40 acres) and government lot level. The layer...

  6. Control Point Generated PLS - polygons

    Data.gov (United States)

    Minnesota Department of Natural Resources — The Control Point Generated PLS layer contains line and polygon features to the 1/4 of 1/4 PLS section (approximately 40 acres) and government lot level. The layer...

  7. PLS-based memory control scheme for enhanced process monitoring

    KAUST Repository

    Harrou, Fouzi

    2017-01-20

    Fault detection is important for safe operation of various modern engineering systems. Partial least square (PLS) has been widely used in monitoring highly correlated process variables. Conventional PLS-based methods, nevertheless, often fail to detect incipient faults. In this paper, we develop new PLS-based monitoring chart, combining PLS with multivariate memory control chart, the multivariate exponentially weighted moving average (MEWMA) monitoring chart. The MEWMA are sensitive to incipient faults in the process mean, which significantly improves the performance of PLS methods and widen their applicability in practice. Using simulated distillation column data, we demonstrate that the proposed PLS-based MEWMA control chart is more effective in detecting incipient fault in the mean of the multivariate process variables, and outperform the conventional PLS-based monitoring charts.

  8. Timing system for PLS

    International Nuclear Information System (INIS)

    Chang, S.S.; Kim, M.S.; Won, S.C.; Choi, S.J.

    1991-01-01

    The PLS timing system consists of a master oscillator, a repetition rate pulse generator, a storage ring rf synchronizing system, and a rf driver and kicker trigger system composed of a fixed delay module and variable delay modules. All the timing modules are installed in the VME crates and controlled by the 32 bit microprocessors, and communicating with the Host computer via Ethernet. This paper describes the architectural design of this system as well as the requirements of performance

  9. Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation

    Directory of Open Access Journals (Sweden)

    Ahmad Bilfarsah

    2005-04-01

    Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.

  10. On-line monitoring the extract process of Fu-fang Shuanghua oral solution using near infrared spectroscopy and different PLS algorithms

    Science.gov (United States)

    Kang, Qian; Ru, Qingguo; Liu, Yan; Xu, Lingyan; Liu, Jia; Wang, Yifei; Zhang, Yewen; Li, Hui; Zhang, Qing; Wu, Qing

    2016-01-01

    An on-line near infrared (NIR) spectroscopy monitoring method with an appropriate multivariate calibration method was developed for the extraction process of Fu-fang Shuanghua oral solution (FSOS). On-line NIR spectra were collected through two fiber optic probes, which were designed to transmit NIR radiation by a 2 mm flange. Partial least squares (PLS), interval PLS (iPLS) and synergy interval PLS (siPLS) algorithms were used comparatively for building the calibration regression models. During the extraction process, the feasibility of NIR spectroscopy was employed to determine the concentrations of chlorogenic acid (CA) content, total phenolic acids contents (TPC), total flavonoids contents (TFC) and soluble solid contents (SSC). High performance liquid chromatography (HPLC), ultraviolet spectrophotometric method (UV) and loss on drying methods were employed as reference methods. Experiment results showed that the performance of siPLS model is the best compared with PLS and iPLS. The calibration models for AC, TPC, TFC and SSC had high values of determination coefficients of (R2) (0.9948, 0.9992, 0.9950 and 0.9832) and low root mean square error of cross validation (RMSECV) (0.0113, 0.0341, 0.1787 and 1.2158), which indicate a good correlation between reference values and NIR predicted values. The overall results show that the on line detection method could be feasible in real application and would be of great value for monitoring the mixed decoction process of FSOS and other Chinese patent medicines.

  11. Method for nonlinear exponential regression analysis

    Science.gov (United States)

    Junkin, B. G.

    1972-01-01

    Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.

  12. A method for nonlinear exponential regression analysis

    Science.gov (United States)

    Junkin, B. G.

    1971-01-01

    A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.

  13. LS-SVM: uma nova ferramenta quimiométrica para regressão multivariada. Comparação de modelos de regressão LS-SVM e PLS na quantificação de adulterantes em leite em pó empregando NIR LS-SVM: a new chemometric tool for multivariate regression. Comparison of LS-SVM and pls regression for determination of common adulterants in powdered milk by nir spectroscopy

    Directory of Open Access Journals (Sweden)

    Marco F. Ferrão

    2007-08-01

    Full Text Available Least-squares support vector machines (LS-SVM were used as an alternative multivariate calibration method for the simultaneous quantification of some common adulterants found in powdered milk samples, using near-infrared spectroscopy. Excellent models were built using LS-SVM for determining R², RMSECV and RMSEP values. LS-SVMs show superior performance for quantifying starch, whey and sucrose in powdered milk samples in relation to PLSR. This study shows that it is possible to determine precisely the amount of one and two common adulterants simultaneously in powdered milk samples using LS-SVM and NIR spectra.

  14. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    Science.gov (United States)

    Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

    2015-05-01

    The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels

  15. New strategy for determination of anthocyanins, polyphenols and antioxidant capacity of Brassica oleracea liquid extract using infrared spectroscopies and multivariate regression

    Science.gov (United States)

    de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.

    2018-04-01

    A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.

  16. A consensus successive projections algorithm--multiple linear regression method for analyzing near infrared spectra.

    Science.gov (United States)

    Liu, Ke; Chen, Xiaojing; Li, Limin; Chen, Huiling; Ruan, Xiukai; Liu, Wenbin

    2015-02-09

    The successive projections algorithm (SPA) is widely used to select variables for multiple linear regression (MLR) modeling. However, SPA used only once may not obtain all the useful information of the full spectra, because the number of selected variables cannot exceed the number of calibration samples in the SPA algorithm. Therefore, the SPA-MLR method risks the loss of useful information. To make a full use of the useful information in the spectra, a new method named "consensus SPA-MLR" (C-SPA-MLR) is proposed herein. This method is the combination of consensus strategy and SPA-MLR method. In the C-SPA-MLR method, SPA-MLR is used to construct member models with different subsets of variables, which are selected from the remaining variables iteratively. A consensus prediction is obtained by combining the predictions of the member models. The proposed method is evaluated by analyzing the near infrared (NIR) spectra of corn and diesel. The results of C-SPA-MLR method showed a better prediction performance compared with the SPA-MLR and full-spectra PLS methods. Moreover, these results could serve as a reference for combination the consensus strategy and other variable selection methods when analyzing NIR spectra and other spectroscopic techniques. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation

    Directory of Open Access Journals (Sweden)

    Sharad Damodar Gore

    2009-10-01

    Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.

  18. A multiple regression method for genomewide association studies ...

    Indian Academy of Sciences (India)

    Bujun Mei

    2018-06-07

    Jun 7, 2018 ... Similar to the typical genomewide association tests using LD ... new approach performed validly when the multiple regression based on linkage method was employed. .... the model, two groups of scenarios were simulated.

  19. BOX-COX REGRESSION METHOD IN TIME SCALING

    Directory of Open Access Journals (Sweden)

    ATİLLA GÖKTAŞ

    2013-06-01

    Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error  when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.

  20. Statistical process control of cocrystallization processes: A comparison between OPLS and PLS.

    Science.gov (United States)

    Silva, Ana F T; Sarraguça, Mafalda Cruz; Ribeiro, Paulo R; Santos, Adenilson O; De Beer, Thomas; Lopes, João Almeida

    2017-03-30

    Orthogonal partial least squares regression (OPLS) is being increasingly adopted as an alternative to partial least squares (PLS) regression due to the better generalization that can be achieved. Particularly in multivariate batch statistical process control (BSPC), the use of OPLS for estimating nominal trajectories is advantageous. In OPLS, the nominal process trajectories are expected to be captured in a single predictive principal component while uncorrelated variations are filtered out to orthogonal principal components. In theory, OPLS will yield a better estimation of the Hotelling's T 2 statistic and corresponding control limits thus lowering the number of false positives and false negatives when assessing the process disturbances. Although OPLS advantages have been demonstrated in the context of regression, its use on BSPC was seldom reported. This study proposes an OPLS-based approach for BSPC of a cocrystallization process between hydrochlorothiazide and p-aminobenzoic acid monitored on-line with near infrared spectroscopy and compares the fault detection performance with the same approach based on PLS. A series of cocrystallization batches with imposed disturbances were used to test the ability to detect abnormal situations by OPLS and PLS-based BSPC methods. Results demonstrated that OPLS was generally superior in terms of sensibility and specificity in most situations. In some abnormal batches, it was found that the imposed disturbances were only detected with OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. On two flexible methods of 2-dimensional regression analysis

    Czech Academy of Sciences Publication Activity Database

    Volf, Petr

    2012-01-01

    Roč. 18, č. 4 (2012), s. 154-164 ISSN 1803-9782 Grant - others:GA ČR(CZ) GAP209/10/2045 Institutional support: RVO:67985556 Keywords : regression analysis * Gordon surface * prediction error * projection pursuit Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2013/SI/volf-on two flexible methods of 2-dimensional regression analysis.pdf

  2. semPLS: Structural Equation Modeling Using Partial Least Squares

    Directory of Open Access Journals (Sweden)

    Armin Monecke

    2012-05-01

    Full Text Available Structural equation models (SEM are very popular in many disciplines. The partial least squares (PLS approach to SEM offers an alternative to covariance-based SEM, which is especially suited for situations when data is not normally distributed. PLS path modelling is referred to as soft-modeling-technique with minimum demands regarding mea- surement scales, sample sizes and residual distributions. The semPLS package provides the capability to estimate PLS path models within the R programming environment. Different setups for the estimation of factor scores can be used. Furthermore it contains modular methods for computation of bootstrap confidence intervals, model parameters and several quality indices. Various plot functions help to evaluate the model. The well known mobile phone dataset from marketing research is used to demonstrate the features of the package.

  3. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Boucher, Thomas F., E-mail: boucher@cs.umass.edu [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Ozanne, Marie V. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Carmosino, Marco L. [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Dyar, M. Darby [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Mahadevan, Sridhar [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Breves, Elly A.; Lepore, Kate H. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Clegg, Samuel M. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

    2015-05-01

    The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO{sub 2}, Fe{sub 2}O{sub 3}, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na{sub 2}O, K{sub 2}O, TiO{sub 2}, and P{sub 2}O{sub 5}, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high

  4. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

    2015-01-01

    The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO 2 , Fe 2 O 3 , CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na 2 O, K 2 O, TiO 2 , and P 2 O 5 , the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144

  5. Thermal Efficiency Degradation Diagnosis Method Using Regression Model

    International Nuclear Information System (INIS)

    Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol

    2011-01-01

    This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant

  6. Interval ridge regression (iRR) as a fast and robust method for quantitative prediction and variable selection applied to edible oil adulteration.

    Science.gov (United States)

    Jović, Ozren; Smrečki, Neven; Popović, Zora

    2016-04-01

    A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for poil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEPoil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Linear regression methods a ccording to objective functions

    OpenAIRE

    Yasemin Sisman; Sebahattin Bektas

    2012-01-01

    The aim of the study is to explain the parameter estimation methods and the regression analysis. The simple linear regressionmethods grouped according to the objective function are introduced. The numerical solution is achieved for the simple linear regressionmethods according to objective function of Least Squares and theLeast Absolute Value adjustment methods. The success of the appliedmethods is analyzed using their objective function values.

  8. Fault detection in processes represented by PLS models using an EWMA control scheme

    KAUST Repository

    Harrou, Fouzi

    2016-10-20

    Fault detection is important for effective and safe process operation. Partial least squares (PLS) has been used successfully in fault detection for multivariate processes with highly correlated variables. However, the conventional PLS-based detection metrics, such as the Hotelling\\'s T and the Q statistics are not well suited to detect small faults because they only use information about the process in the most recent observation. Exponentially weighed moving average (EWMA), however, has been shown to be more sensitive to small shifts in the mean of process variables. In this paper, a PLS-based EWMA fault detection method is proposed for monitoring processes represented by PLS models. The performance of the proposed method is compared with that of the traditional PLS-based fault detection method through a simulated example involving various fault scenarios that could be encountered in real processes. The simulation results clearly show the effectiveness of the proposed method over the conventional PLS method.

  9. Comparing parametric and nonparametric regression methods for panel data

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...

  10. [Influence of Spectral Pre-Processing on PLS Quantitative Model of Detecting Cu in Navel Orange by LIBS].

    Science.gov (United States)

    Li, Wen-bing; Yao, Lin-tao; Liu, Mu-hua; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; He, Xiu-wen; Yang, Ping; Hu, Hui-qin; Nie, Jiang-hui

    2015-05-01

    Cu in navel orange was detected rapidly by laser-induced breakdown spectroscopy (LIBS) combined with partial least squares (PLS) for quantitative analysis, then the effect on the detection accuracy of the model with different spectral data ptetreatment methods was explored. Spectral data for the 52 Gannan navel orange samples were pretreated by different data smoothing, mean centralized and standard normal variable transform. Then 319~338 nm wavelength section containing characteristic spectral lines of Cu was selected to build PLS models, the main evaluation indexes of models such as regression coefficient (r), root mean square error of cross validation (RMSECV) and the root mean square error of prediction (RMSEP) were compared and analyzed. Three indicators of PLS model after 13 points smoothing and processing of the mean center were found reaching 0. 992 8, 3. 43 and 3. 4 respectively, the average relative error of prediction model is only 5. 55%, and in one word, the quality of calibration and prediction of this model are the best results. The results show that selecting the appropriate data pre-processing method, the prediction accuracy of PLS quantitative model of fruits and vegetables detected by LIBS can be improved effectively, providing a new method for fast and accurate detection of fruits and vegetables by LIBS.

  11. Sparse kernel orthonormalized PLS for feature extraction in large datasets

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Petersen, Kaare Brandt; Hansen, Lars Kai

    2006-01-01

    In this paper we are presenting a novel multivariate analysis method for large scale problems. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm...... is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method...

  12. Nuclear magnetic resonance metabonomic profiling using tO2PLS

    Energy Technology Data Exchange (ETDEWEB)

    Kirwan, Gemma M., E-mail: gemma.kirwan@gmail.com [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia); Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Hancock, Timothy [Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Hassell, Kathryn [Biotechnology and Environmental Biology, School of Applied Sciences, RMIT University, PO Box 71, Bundoora, Vic 3083 (Australia); Niere, Julie O. [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia); Nugegoda, Dayanthi [Biotechnology and Environmental Biology, School of Applied Sciences, RMIT University, PO Box 71, Bundoora, Vic 3083 (Australia); Goto, Susumu [Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Adams, Michael J. [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia)

    2013-06-05

    Graphical abstract: -- Highlights: •Transposition of O2PLS input matrix (tO2PLS) to analyze metabonomics data. •tO2PLS specific components describe features that separate and define sample groups. •Application of tO2PLS to a {sup 1}H NMR metabonomics study of black bream fish. -- Abstract: Blood plasma collected from adult fish (black bream, Sparidae) exposed to a dose of 5 mg kg{sup −1} 17β-estradiol underwent metabonomic profiling using nuclear magnetic resonance (NMR). An extension of the orthogonal 2 projection to latent structure (O2PLS) analysis, tO2PLS, was proposed and utilized to classify changes between the control and experimental metabolic profiles. As a bidirectional modeling tool, O2PLS examines the (variable) commonality between two different data blocks, and extracts the joint correlations as well as the unique variations present within each data block. tO2PLS is a proposed matrix transposition of O2PLS to allow for commonality between experiments (spectral profiles) to be observed, rather than between sample variables. tO2PLS analysis highlighted two potential biomarkers, trimethylamine-N-oxide (TMAO) and choline, that distinguish between control and 17β-estradiol exposed fish. This study presents an alternative way of examining spectroscopic (metabolite) data, providing a method for the visual assessment of similarities and differences between control and experimental spectral features in large data sets.

  13. Nuclear magnetic resonance metabonomic profiling using tO2PLS

    International Nuclear Information System (INIS)

    Kirwan, Gemma M.; Hancock, Timothy; Hassell, Kathryn; Niere, Julie O.; Nugegoda, Dayanthi; Goto, Susumu; Adams, Michael J.

    2013-01-01

    Graphical abstract: -- Highlights: •Transposition of O2PLS input matrix (tO2PLS) to analyze metabonomics data. •tO2PLS specific components describe features that separate and define sample groups. •Application of tO2PLS to a 1 H NMR metabonomics study of black bream fish. -- Abstract: Blood plasma collected from adult fish (black bream, Sparidae) exposed to a dose of 5 mg kg −1 17β-estradiol underwent metabonomic profiling using nuclear magnetic resonance (NMR). An extension of the orthogonal 2 projection to latent structure (O2PLS) analysis, tO2PLS, was proposed and utilized to classify changes between the control and experimental metabolic profiles. As a bidirectional modeling tool, O2PLS examines the (variable) commonality between two different data blocks, and extracts the joint correlations as well as the unique variations present within each data block. tO2PLS is a proposed matrix transposition of O2PLS to allow for commonality between experiments (spectral profiles) to be observed, rather than between sample variables. tO2PLS analysis highlighted two potential biomarkers, trimethylamine-N-oxide (TMAO) and choline, that distinguish between control and 17β-estradiol exposed fish. This study presents an alternative way of examining spectroscopic (metabolite) data, providing a method for the visual assessment of similarities and differences between control and experimental spectral features in large data sets

  14. Offset Free Tracking Predictive Control Based on Dynamic PLS Framework

    Directory of Open Access Journals (Sweden)

    Jin Xin

    2017-10-01

    Full Text Available This paper develops an offset free tracking model predictive control based on a dynamic partial least square (PLS framework. First, state space model is used as the inner model of PLS to describe the dynamic system, where subspace identification method is used to identify the inner model. Based on the obtained model, multiple independent model predictive control (MPC controllers are designed. Due to the decoupling character of PLS, these controllers are running separately, which is suitable for distributed control framework. In addition, the increment of inner model output is considered in the cost function of MPC, which involves integral action in the controller. Hence, the offset free tracking performance is guaranteed. The results of an industry background simulation demonstrate the effectiveness of proposed method.

  15. FATAL, General Experiment Fitting Program by Nonlinear Regression Method

    International Nuclear Information System (INIS)

    Salmon, L.; Budd, T.; Marshall, M.

    1982-01-01

    1 - Description of problem or function: A generalized fitting program with a free-format keyword interface to the user. It permits experimental data to be fitted by non-linear regression methods to any function describable by the user. The user requires the minimum of computer experience but needs to provide a subroutine to define his function. Some statistical output is included as well as 'best' estimates of the function's parameters. 2 - Method of solution: The regression method used is based on a minimization technique devised by Powell (Harwell Subroutine Library VA05A, 1972) which does not require the use of analytical derivatives. The method employs a quasi-Newton procedure balanced with a steepest descent correction. Experience shows this to be efficient for a very wide range of application. 3 - Restrictions on the complexity of the problem: The current version of the program permits functions to be defined with up to 20 parameters. The function may be fitted to a maximum of 400 points, preferably with estimated values of weight given

  16. Mapping urban environmental noise: a land use regression method.

    Science.gov (United States)

    Xie, Dan; Liu, Yi; Chen, Jining

    2011-09-01

    Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.

  17. Modeling soil organic matter (SOM) from satellite data using VISNIR-SWIR spectroscopy and PLS regression with step-down variable selection algorithm: case study of Campos Amazonicos National Park savanna enclave, Brazil

    Science.gov (United States)

    Rosero-Vlasova, O.; Borini Alves, D.; Vlassova, L.; Perez-Cabello, F.; Montorio Lloveria, R.

    2017-10-01

    Deforestation in Amazon basin due, among other factors, to frequent wildfires demands continuous post-fire monitoring of soil and vegetation. Thus, the study posed two objectives: (1) evaluate the capacity of Visible - Near InfraRed - ShortWave InfraRed (VIS-NIR-SWIR) spectroscopy to estimate soil organic matter (SOM) in fire-affected soils, and (2) assess the feasibility of SOM mapping from satellite images. For this purpose, 30 soil samples (surface layer) were collected in 2016 in areas of grass and riparian vegetation of Campos Amazonicos National Park, Brazil, repeatedly affected by wildfires. Standard laboratory procedures were applied to determine SOM. Reflectance spectra of soils were obtained in controlled laboratory conditions using Fieldspec4 spectroradiometer (spectral range 350nm- 2500nm). Measured spectra were resampled to simulate reflectances for Landsat-8, Sentinel-2 and EnMap spectral bands, used as predictors in SOM models developed using Partial Least Squares regression and step-down variable selection algorithm (PLSR-SD). The best fit was achieved with models based on reflectances simulated for EnMap bands (R2=0.93; R2cv=0.82 and NMSE=0.07; NMSEcv=0.19). The model uses only 8 out of 244 predictors (bands) chosen by the step-down variable selection algorithm. The least reliable estimates (R2=0.55 and R2cv=0.40 and NMSE=0.43; NMSEcv=0.60) resulted from Landsat model, while Sentinel-2 model showed R2=0.68 and R2cv=0.63; NMSE=0.31 and NMSEcv=0.38. The results confirm high potential of VIS-NIR-SWIR spectroscopy for SOM estimation. Application of step-down produces sparser and better-fit models. Finally, SOM can be estimated with an acceptable accuracy (NMSE 0.35) from EnMap and Sentinel-2 data enabling mapping and analysis of impacts of repeated wildfires on soils in the study area.

  18. Fault detection in processes represented by PLS models using an EWMA control scheme

    KAUST Repository

    Harrou, Fouzi; Nounou, Mohamed N.; Nounou, Hazem N.

    2016-01-01

    with that of the traditional PLS-based fault detection method through a simulated example involving various fault scenarios that could be encountered in real processes. The simulation results clearly show the effectiveness of the proposed method over the conventional PLS

  19. Prediction of the distillation temperatures of crude oils using ¹H NMR and support vector regression with estimated confidence intervals.

    Science.gov (United States)

    Filgueiras, Paulo R; Terra, Luciana A; Castro, Eustáquio V R; Oliveira, Lize M S L; Dias, Júlio C M; Poppi, Ronei J

    2015-09-01

    This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using (1)H NMR and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the PLS method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6°C was obtained in comparison with 15.6°C for PLS, 15.1°C for ePLS and 28.4°C for SVR. The RMSEPs for T50% were 24.2°C, 23.4°C, 22.8°C and 14.4°C for PLS, ePLS, SVR and eSVR, respectively. For T90%, the values of RMSEP were 39.0°C, 39.9°C and 39.9°C for PLS, ePLS, SVR and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Dimension Reduction and Discretization in Stochastic Problems by Regression Method

    DEFF Research Database (Denmark)

    Ditlevsen, Ove Dalager

    1996-01-01

    The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ...

  1. Analyzing Big Data with the Hybrid Interval Regression Methods

    Directory of Open Access Journals (Sweden)

    Chia-Hui Huang

    2014-01-01

    Full Text Available Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM to analyze big data. Recently, the smooth support vector machine (SSVM was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

  2. Regression and Sparse Regression Methods for Viscosity Estimation of Acid Milk From it’s Sls Features

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann

    2012-01-01

    Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...

  3. Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

    Science.gov (United States)

    Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

    2017-04-01

    Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP

  4. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    Science.gov (United States)

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  5. PLS beam position measurement and feedback system

    International Nuclear Information System (INIS)

    Huang, J.Y.; Lee, J.; Park, M.K.; Kim, J.H.; Won, S.C.

    1992-01-01

    A real-time orbit correction system is proposed for the stabilization of beam orbit and photon beam positions in Pohang Light Source. PLS beam position monitoring system is designed to be VMEbus compatible to fit the real-time digital orbit feedback system. A VMEbus based subsystem control computer, Mil-1553B communication network and 12 BPM/PS machine interface units constitute digital part of the feedback system. With the super-stable PLS correction magnet power supply, power line frequency noise is almost filtered out and the dominant spectra of beam obtit fluctuations are expected to appear below 15 Hz. Using DSP board in SCC for the computation and using an appropriate compensation circuit for the phase delay by the vacuum chamber, PLS real-time orbit correction system is realizable without changing the basic structure of PLS computer control system. (author)

  6. Partial Least Squares Strukturgleichungsmodellierung (PLS-SEM)

    DEFF Research Database (Denmark)

    Hair, Joseph F.; Hult, G. Tomas M.; Ringle, Christian M.

    (PLS-SEM) hat sich in der wirtschafts- und sozialwissenschaftlichen Forschung als geeignetes Verfahren zur Schätzung von Kausalmodellen behauptet. Dank der Anwenderfreundlichkeit des Verfahrens und der vorhandenen Software ist es inzwischen auch in der Praxis etabliert. Dieses Buch liefert eine...... anwendungsorientierte Einführung in die PLS-SEM. Der Fokus liegt auf den Grundlagen des Verfahrens und deren praktischer Umsetzung mit Hilfe der SmartPLS-Software. Das Konzept des Buches setzt dabei auf einfache Erläuterungen statistischer Ansätze und die anschauliche Darstellung zahlreicher Anwendungsbeispiele anhand...... einer einheitlichen Fallstudie. Viele Grafiken, Tabellen und Illustrationen erleichtern das Verständnis der PLS-SEM. Zudem werden dem Leser herunterladbare Datensätze, Aufgaben und weitere Fachartikel zur Vertiefung angeboten. Damit eignet sich das Buch hervorragend für Studierende, Forscher und...

  7. Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression

    OpenAIRE

    Plata, Maria R.; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

    2013-01-01

    A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference anal...

  8. Methods of Detecting Outliers in A Regression Analysis Model ...

    African Journals Online (AJOL)

    PROF. O. E. OSUAGWU

    2013-06-01

    Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.

  9. Handbook of Partial Least Squares Concepts, Methods and Applications

    CERN Document Server

    Vinzi, Vincenzo Esposito; Henseler, Jörg

    2010-01-01

    This handbook provides a comprehensive overview of Partial Least Squares (PLS) methods with specific reference to their use in marketing and with a discussion of the directions of current research and perspectives. It covers the broad area of PLS methods, from regression to structural equation modeling applications, software and interpretation of results. The handbook serves both as an introduction for those without prior knowledge of PLS and as a comprehensive reference for researchers and practitioners interested in the most recent advances in PLS methodology.

  10. Analysis of some methods for reduced rank Gaussian process regression

    DEFF Research Database (Denmark)

    Quinonero-Candela, J.; Rasmussen, Carl Edward

    2005-01-01

    While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...

  11. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    Science.gov (United States)

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  12. Determination of Trace Amounts of Gold in Environmental Samples by Adsorptive Stripping Voltammetry of Its Complex with Rhodamine Using Osc-Pls

    Directory of Open Access Journals (Sweden)

    A. Akrami

    2012-11-01

    Full Text Available The multivariate calibration method was applied for the determination of trace amounts of gold based on a hanging mercury drop electrode (HMDE in the presence of rhodanine, followed by reduction of adsorbed gold by voltammetric scan using differential pulse modulation The optimum experimental conditions are: rhodanine concentration of 0.20 mg mL-1, pH 5.0, accumulation potential of -600 mV versus Ag/AgCl, accumulation time of 100 sec, scan rate of 30 mV s-1 and pulse height of 100 mV. The calibration matrix for partial least squares (PLS regression was designed with 9 samples. Orthogonal signal correction (OSC is a preprocessing technique used for removing the information unrelated to the target variables based on constrained principal component analysis. OSC is a suitable preprocessing method for PLS calibration without loss of prediction capacity using electrochemical method. The RMSEP for gold determination with PLS and OSC-PLS were 8.51 and 1.94, respectively. This procedure allows the determination of gold in synthetic and real samples with good reliability of the determination. 

  13. Methods for estimating disease transmission rates: Evaluating the precision of Poisson regression and two novel methods

    DEFF Research Database (Denmark)

    Kirkeby, Carsten Thure; Hisham Beshara Halasa, Tariq; Gussmann, Maya Katrin

    2017-01-01

    the transmission rate. We use data from the two simulation models and vary the sampling intervals and the size of the population sampled. We devise two new methods to determine transmission rate, and compare these to the frequently used Poisson regression method in both epidemic and endemic situations. For most...... tested scenarios these new methods perform similar or better than Poisson regression, especially in the case of long sampling intervals. We conclude that transmission rate estimates are easily biased, which is important to take into account when using these rates in simulation models....

  14. Ordinary Least Squares and Quantile Regression: An Inquiry-Based Learning Approach to a Comparison of Regression Methods

    Science.gov (United States)

    Helmreich, James E.; Krog, K. Peter

    2018-01-01

    We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…

  15. A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

    Science.gov (United States)

    Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

    2017-07-01

    Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant

  16. Helium Leak Test for the PLS Storage Ring Chamber

    International Nuclear Information System (INIS)

    Choi, M. H.; Kim, H. J.; Choi, W. C.

    1993-01-01

    The storage ring vacuum system for the Pohang Light Source (PLS) has been designed to maintain the vacuum pressure of 10 1 0 Torr which requires UHV welding to have helium leak rate less than 1x10 1 0 Torr·L/sec. In order to develop new technique (PLS) welding technique), a prototype vacuum chamber has been welded by using Tungsten Inert Gas welding method and all the welded joints have been tested with a non-destructive method, so called helium leak detection, to investigate the vacuum tightness of the weld joints. The test was performed with a detection limit of 1x10 1 0 Torr·L/sec for helium and no detectable leaks were found for all the welded joints. Thus the performance of welding technique is proven to meet the criteria of helium leak rate required in the PLS Storage Ring. Both the principle and the procedure for the helium leak detection are also discussed

  17. Finding-equal regression method and its application in predication of U resources

    International Nuclear Information System (INIS)

    Cao Huimo

    1995-03-01

    The commonly adopted deposit model method in mineral resources predication has two main part: one is model data that show up geological mineralization law for deposit, the other is statistics predication method that accords with characters of the data namely pretty regression method. This kind of regression method may be called finding-equal regression, which is made of the linear regression and distribution finding-equal method. Because distribution finding-equal method is a data pretreatment which accords with advanced mathematical precondition for the linear regression namely equal distribution theory, and this kind of data pretreatment is possible of realization. Therefore finding-equal regression not only can overcome nonlinear limitations, that are commonly occurred in traditional linear regression or other regression and always have no solution, but also can distinguish outliers and eliminate its weak influence, which would usually appeared when Robust regression possesses outlier in independent variables. Thus this newly finding-equal regression stands the best status in all kind of regression methods. Finally, two good examples of U resource quantitative predication are provided

  18. Hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) and its application to predicting key process variables.

    Science.gov (United States)

    He, Yan-Lin; Xu, Yuan; Geng, Zhi-Qiang; Zhu, Qun-Xiong

    2016-03-01

    In this paper, a hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) is proposed. Firstly, an improved functional link neural network with small norm of expanded weights and high input-output correlation (SNEWHIOC-FLNN) was proposed for enhancing the generalization performance of FLNN. Unlike the traditional FLNN, the expanded variables of the original inputs are not directly used as the inputs in the proposed SNEWHIOC-FLNN model. The original inputs are attached to some small norm of expanded weights. As a result, the correlation coefficient between some of the expanded variables and the outputs is enhanced. The larger the correlation coefficient is, the more relevant the expanded variables tend to be. In the end, the expanded variables with larger correlation coefficient are selected as the inputs to improve the performance of the traditional FLNN. In order to test the proposed SNEWHIOC-FLNN model, three UCI (University of California, Irvine) regression datasets named Housing, Concrete Compressive Strength (CCS), and Yacht Hydro Dynamics (YHD) are selected. Then a hybrid model based on the improved FLNN integrating with partial least square (IFLNN-PLS) was built. In IFLNN-PLS model, the connection weights are calculated using the partial least square method but not the error back propagation algorithm. Lastly, IFLNN-PLS was developed as an intelligent measurement model for accurately predicting the key variables in the Purified Terephthalic Acid (PTA) process and the High Density Polyethylene (HDPE) process. Simulation results illustrated that the IFLNN-PLS could significant improve the prediction performance. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  19. Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data

    Science.gov (United States)

    Yin, Shen; Wang, Guang; Yang, Xu

    2014-07-01

    In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.

  20. Status report on control system development for PLS

    International Nuclear Information System (INIS)

    Won, S.C.; Chang, S.S.; Huang, J.; Lee, J.W.; Lee, J.; Kim, J.H.

    1992-01-01

    Emphasizing reliability and flexibility, hierarchical architecture with distributed computers have been designed into the Pohang Light Source (PLS) computer control system. The PLS control system has four layers of computer systems connected via multiple data communication networks. This paper presents an overview of the PLS control system. (author)

  1. Quantitative analysis of glycated albumin in serum based on ATR-FTIR spectrum combined with SiPLS and SVM.

    Science.gov (United States)

    Li, Yuanpeng; Li, Fucui; Yang, Xinhao; Guo, Liu; Huang, Furong; Chen, Zhenqiang; Chen, Xingdan; Zheng, Shifu

    2018-08-05

    A rapid quantitative analysis model for determining the glycated albumin (GA) content based on Attenuated total reflectance (ATR)-Fourier transform infrared spectroscopy (FTIR) combining with linear SiPLS and nonlinear SVM has been developed. Firstly, the real GA content in human serum was determined by GA enzymatic method, meanwhile, the ATR-FTIR spectra of serum samples from the population of health examination were obtained. The spectral data of the whole spectra mid-infrared region (4000-600 cm -1 ) and GA's characteristic region (1800-800 cm -1 ) were used as the research object of quantitative analysis. Secondly, several preprocessing steps including first derivative, second derivative, variable standardization and spectral normalization, were performed. Lastly, quantitative analysis regression models were established by using SiPLS and SVM respectively. The SiPLS modeling results are as follows: root mean square error of cross validation (RMSECV T ) = 0.523 g/L, calibration coefficient (R C ) = 0.937, Root Mean Square Error of Prediction (RMSEP T ) = 0.787 g/L, and prediction coefficient (R P ) = 0.938. The SVM modeling results are as follows: RMSECV T  = 0.0048 g/L, R C  = 0.998, RMSEP T  = 0.442 g/L, and R p  = 0.916. The results indicated that the model performance was improved significantly after preprocessing and optimization of characteristic regions. While modeling performance of nonlinear SVM was considerably better than that of linear SiPLS. Hence, the quantitative analysis model for GA in human serum based on ATR-FTIR combined with SiPLS and SVM is effective. And it does not need sample preprocessing while being characterized by simple operations and high time efficiency, providing a rapid and accurate method for GA content determination. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. [MEG]PLS: A pipeline for MEG data analysis and partial least squares statistics.

    Science.gov (United States)

    Cheung, Michael J; Kovačević, Natasa; Fatima, Zainab; Mišić, Bratislav; McIntosh, Anthony R

    2016-01-01

    The emphasis of modern neurobiological theories has recently shifted from the independent function of brain areas to their interactions in the context of whole-brain networks. As a result, neuroimaging methods and analyses have also increasingly focused on network discovery. Magnetoencephalography (MEG) is a neuroimaging modality that captures neural activity with a high degree of temporal specificity, providing detailed, time varying maps of neural activity. Partial least squares (PLS) analysis is a multivariate framework that can be used to isolate distributed spatiotemporal patterns of neural activity that differentiate groups or cognitive tasks, to relate neural activity to behavior, and to capture large-scale network interactions. Here we introduce [MEG]PLS, a MATLAB-based platform that streamlines MEG data preprocessing, source reconstruction and PLS analysis in a single unified framework. [MEG]PLS facilitates MRI preprocessing, including segmentation and coregistration, MEG preprocessing, including filtering, epoching, and artifact correction, MEG sensor analysis, in both time and frequency domains, MEG source analysis, including multiple head models and beamforming algorithms, and combines these with a suite of PLS analyses. The pipeline is open-source and modular, utilizing functions from FieldTrip (Donders, NL), AFNI (NIMH, USA), SPM8 (UCL, UK) and PLScmd (Baycrest, CAN), which are extensively supported and continually developed by their respective communities. [MEG]PLS is flexible, providing both a graphical user interface and command-line options, depending on the needs of the user. A visualization suite allows multiple types of data and analyses to be displayed and includes 4-D montage functionality. [MEG]PLS is freely available under the GNU public license (http://meg-pls.weebly.com). Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function

    Directory of Open Access Journals (Sweden)

    Hailun Wang

    2017-01-01

    Full Text Available Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.

  4. On-line monitoring of extraction process of Flos Lonicerae Japonicae using near infrared spectroscopy combined with synergy interval PLS and genetic algorithm

    Science.gov (United States)

    Yang, Yue; Wang, Lei; Wu, Yongjiang; Liu, Xuesong; Bi, Yuan; Xiao, Wei; Chen, Yong

    2017-07-01

    There is a growing need for the effective on-line process monitoring during the manufacture of traditional Chinese medicine to ensure quality consistency. In this study, the potential of near infrared (NIR) spectroscopy technique to monitor the extraction process of Flos Lonicerae Japonicae was investigated. A new algorithm of synergy interval PLS with genetic algorithm (Si-GA-PLS) was proposed for modeling. Four different PLS models, namely Full-PLS, Si-PLS, GA-PLS, and Si-GA-PLS, were established, and their performances in predicting two quality parameters (viz. total acid and soluble solid contents) were compared. In conclusion, Si-GA-PLS model got the best results due to the combination of superiority of Si-PLS and GA. For Si-GA-PLS, the determination coefficient (Rp2) and root-mean-square error for the prediction set (RMSEP) were 0.9561 and 147.6544 μg/ml for total acid, 0.9062 and 0.1078% for soluble solid contents, correspondingly. The overall results demonstrated that the NIR spectroscopy technique combined with Si-GA-PLS calibration is a reliable and non-destructive alternative method for on-line monitoring of the extraction process of TCM on the production scale.

  5. Robust Ultraviolet-Visible (UV-Vis) Partial Least-Squares (PLS) Models for Tannin Quantification in Red Wine.

    Science.gov (United States)

    Aleixandre-Tudo, José Luis; Nieuwoudt, Helené; Aleixandre, José Luis; Du Toit, Wessel J

    2015-02-04

    The validation of ultraviolet-visible (UV-vis) spectroscopy combined with partial least-squares (PLS) regression to quantify red wine tannins is reported. The methylcellulose precipitable (MCP) tannin assay and the bovine serum albumin (BSA) tannin assay were used as reference methods. To take the high variability of wine tannins into account when the calibration models were built, a diverse data set was collected from samples of South African red wines that consisted of 18 different cultivars, from regions spanning the wine grape-growing areas of South Africa with their various sites, climates, and soils, ranging in vintage from 2000 to 2012. A total of 240 wine samples were analyzed, and these were divided into a calibration set (n = 120) and a validation set (n = 120) to evaluate the predictive ability of the models. To test the robustness of the PLS calibration models, the predictive ability of the classifying variables cultivar, vintage year, and experimental versus commercial wines was also tested. In general, the statistics obtained when BSA was used as a reference method were slightly better than those obtained with MCP. Despite this, the MCP tannin assay should also be considered as a valid reference method for developing PLS calibrations. The best calibration statistics for the prediction of new samples were coefficient of correlation (R 2 val) = 0.89, root mean standard error of prediction (RMSEP) = 0.16, and residual predictive deviation (RPD) = 3.49 for MCP and R 2 val = 0.93, RMSEP = 0.08, and RPD = 4.07 for BSA, when only the UV region (260-310 nm) was selected, which also led to a faster analysis time. In addition, a difference in the results obtained when the predictive ability of the classifying variables vintage, cultivar, or commercial versus experimental wines was studied suggests that tannin composition is highly affected by many factors. This study also discusses the correlations in tannin values between the methylcellulose and protein

  6. Genetic algorithm-based wavelength selection in multicomponent spectrophotometric determination by PLS: Application on sulfamethoxazole and trimethoprim mixture in bovine milk

    Directory of Open Access Journals (Sweden)

    Givianrad Hadi Mohammad

    2013-01-01

    Full Text Available The simultaneous determination of sulfamethoxazole (SMX and trimethoprim (TMP mixtures in bovine milk by spectrophotometric method is a difficult problem in analytical chemistry, due to spectral interferences. By means of multivariate calibration methods, such as partial least square (PLS regression, it is possible to obtain a model adjusted to the concentration values of the mixtures used in the calibration range. Genetic algorithm (GA is a suitable method for selecting wavelengths for PLS calibration of mixtures with almost identical spectra without loss of prediction capacity using the spectrophotometric method. In this study, the calibration model based on absorption spectra in the 200-400 nm range for 25 different mixtures of SMX and TMP Calibration matrices were formed form samples containing 0.25-20 and 0.3-21 μg mL-1 for SMX and TMP, at pH=10, respectively. The root mean squared error of deviation (RMSED for SMX and TMP with PLS and genetic algorithm partial least square (GAPLS were 0.242, 0.066 μgmL-1 and 0.074, 0.027 μg mL-1, respectively. This procedure was allowed the simultaneous determination of SMX and TMP in synthetic and real samples and good reliability of the determination was proved.

  7. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...

  8. Fusion of neural computing and PLS techniques for load estimation

    Energy Technology Data Exchange (ETDEWEB)

    Lu, M.; Xue, H.; Cheng, X. [Northwestern Polytechnical Univ., Xi' an (China); Zhang, W. [Xi' an Inst. of Post and Telecommunication, Xi' an (China)

    2007-07-01

    A method to predict the electric load of a power system in real time was presented. The method is based on neurocomputing and partial least squares (PLS). Short-term load forecasts for power systems are generally determined by conventional statistical methods and Computational Intelligence (CI) techniques such as neural computing. However, statistical modeling methods often require the input of questionable distributional assumptions, and neural computing is weak, particularly in determining topology. In order to overcome the problems associated with conventional techniques, the authors developed a CI hybrid model based on neural computation and PLS techniques. The theoretical foundation for the designed CI hybrid model was presented along with its application in a power system. The hybrid model is suitable for nonlinear modeling and latent structure extracting. It can automatically determine the optimal topology to maximize the generalization. The CI hybrid model provides faster convergence and better prediction results compared to the abductive networks model because it incorporates a load conversion technique as well as new transfer functions. In order to demonstrate the effectiveness of the hybrid model, load forecasting was performed on a data set obtained from the Puget Sound Power and Light Company. Compared with the abductive networks model, the CI hybrid model reduced the forecast error by 32.37 per cent on workday, and by an average of 27.18 per cent on the weekend. It was concluded that the CI hybrid model has a more powerful predictive ability. 7 refs., 1 tab., 3 figs.

  9. Sulfur Speciation of Crude Oils by Partial Least Squares Regression Modeling of Their Infrared Spectra

    NARCIS (Netherlands)

    de Peinder, P.; Visser, T.; Wagemans, R.W.P.; Blomberg, J.; Chaabani, H.; Soulimani, F.; Weckhuysen, B.M.

    2013-01-01

    Research has been carried out to determine the feasibility of partial least-squares regression (PLS) modeling of infrared (IR) spectra of crude oils as a tool for fast sulfur speciation. The study is a continuation of a previously developed method to predict long and short residue properties of

  10. Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel

    Directory of Open Access Journals (Sweden)

    Roland Pfister

    2013-10-01

    Full Text Available Three different methods for extracting coefficientsof linear regression analyses are presented. The focus is on automatic and easy-to-use approaches for common statistical packages: SPSS, R, and MS Excel / LibreOffice Calc. Hands-on examples are included for each analysis, followed by a brief description of how a subsequent regression coefficient analysis is performed.

  11. Simultaneous measurement of two enzyme activities using infrared spectroscopy: A comparative evaluation of PARAFAC, TUCKER and N-PLS modeling.

    Science.gov (United States)

    Baum, Andreas; Hansen, Per Waaben; Meyer, Anne S; Mikkelsen, Jørn Dalgaard

    2013-08-06

    Enzymes are used in many processes to release fermentable sugars for green production of biofuel, or the refinery of biomass for extraction of functional food ingredients such as pectin or prebiotic oligosaccharides. The complex biomasses may, however, require a multitude of specific enzymes which are active on specific substrates generating a multitude of products. In this paper we use the plant polymer, pectin, to present a method to quantify enzyme activity of two pectolytic enzymes by monitoring their superimposed spectral evolutions simultaneously. The data is analyzed by three chemometric multiway methods, namely PARAFAC, TUCKER3 and N-PLS, to establish simultaneous enzyme activity assays for pectin lyase and pectin methyl esterase. Correlation coefficients Rpred(2) for prediction test sets are 0.48, 0.96 and 0.96 for pectin lyase and 0.70, 0.89 and 0.89 for pectin methyl esterase, respectively. The retrieved models are compared and prediction test sets show that especially TUCKER3 performs well, even in comparison to the supervised regression method N-PLS. Copyright © 2013 Elsevier B.V. All rights reserved.

  12. A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

    Science.gov (United States)

    Anderson, Carl A; McRae, Allan F; Visscher, Peter M

    2006-07-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.

  13. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    Science.gov (United States)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  14. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)

  15. Classification of cassava starch films by physicochemical properties and water vapor permeability quantification by FTIR and PLS.

    Science.gov (United States)

    Henrique, C M; Teófilo, R F; Sabino, L; Ferreira, M M C; Cereda, M P

    2007-05-01

    Cassava starches are widely used in the production of biodegradable films, but their resistance to humidity migration is very low. In this work, commercial cassava starch films were studied and classified according to their physicochemical properties. A nondestructive method for water vapor permeability determination, which combines with infrared spectroscopy and multivariate calibration, is also presented. The following commercial cassava starches were studied: pregelatinized (amidomax 3550), carboxymethylated starch (CMA) of low and high viscosities, and esterified starches. To make the films, 2 different starch concentrations were evaluated, consisting of water suspensions with 3% and 5% starch. The filmogenic solutions were dried and characterized for their thickness, grammage, water vapor permeability, water activity, tensile strength (deformation force), water solubility, and puncture strength (deformation). The minimum thicknesses were 0.5 to 0.6 mm in pregelatinized starch films. The results were treated by means of the following chemometric methods: principal component analysis (PCA) and partial least squares (PLS) regression. PCA analysis on the physicochemical properties of the films showed that the differences in concentration of the dried material (3% and 5% starch) and also in the type of starch modification were mainly related to the following properties: permeability, solubility, and thickness. IR spectra collected in the region of 4000 to 600 cm(-1) were used to build a PLS model with good predictive power for water vapor permeability determination, with mean relative errors of 10.0% for cross-validation and 7.8% for the prediction set.

  16. Fuzzy Linear Regression for the Time Series Data which is Fuzzified with SMRGT Method

    Directory of Open Access Journals (Sweden)

    Seçil YALAZ

    2016-10-01

    Full Text Available Our work on regression and classification provides a new contribution to the analysis of time series used in many areas for years. Owing to the fact that convergence could not obtained with the methods used in autocorrelation fixing process faced with time series regression application, success is not met or fall into obligation of changing the models’ degree. Changing the models’ degree may not be desirable in every situation. In our study, recommended for these situations, time series data was fuzzified by using the simple membership function and fuzzy rule generation technique (SMRGT and to estimate future an equation has created by applying fuzzy least square regression (FLSR method which is a simple linear regression method to this data. Although SMRGT has success in determining the flow discharge in open channels and can be used confidently for flow discharge modeling in open canals, as well as in pipe flow with some modifications, there is no clue about that this technique is successful in fuzzy linear regression modeling. Therefore, in order to address the luck of such a modeling, a new hybrid model has been described within this study. In conclusion, to demonstrate our methods’ efficiency, classical linear regression for time series data and linear regression for fuzzy time series data were applied to two different data sets, and these two approaches performances were compared by using different measures.

  17. An NCME Instructional Module on Data Mining Methods for Classification and Regression

    Science.gov (United States)

    Sinharay, Sandip

    2016-01-01

    Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…

  18. Further Insight and Additional Inference Methods for Polynomial Regression Applied to the Analysis of Congruence

    Science.gov (United States)

    Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti

    2010-01-01

    In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…

  19. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  20. The comparison of partial least squares and principal component regression in simultaneous spectrophotometric determination of ascorbic acid, dopamine and uric acid in real samples

    Directory of Open Access Journals (Sweden)

    Habiboallah Khajehsharifi

    2017-05-01

    Full Text Available Partial least squares (PLS1 and principal component regression (PCR are two multivariate calibration methods that allow simultaneous determination of several analytes in spite of their overlapping spectra. In this research, a spectrophotometric method using PLS1 is proposed for the simultaneous determination of ascorbic acid (AA, dopamine (DA and uric acid (UA. The linear concentration ranges for AA, DA and UA were 1.76–47.55, 0.57–22.76 and 1.68–28.58 (in μg mL−1, respectively. However, PLS1 and PCR were applied to design calibration set based on absorption spectra in the 250–320 nm range for 36 different mixtures of AA, DA and UA, in all cases, the PLS1 calibration method showed more quantitative prediction ability than PCR method. Cross validation method was used to select the optimum number of principal components (NPC. The NPC for AA, DA and UA was found to be 4 by PLS1 and 5, 12, 8 by PCR. Prediction error sum of squares (PRESS of AA, DA and UA were 1.2461, 1.1144, 2.3104 for PLS1 and 11.0563, 1.3819, 4.0956 for PCR, respectively. Satisfactory results were achieved for the simultaneous determination of AA, DA and UA in some real samples such as human urine, serum and pharmaceutical formulations.

  1. The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies

    Science.gov (United States)

    O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.

    2011-01-01

    The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…

  2. An evaluation of regression methods to estimate nutritional condition of canvasbacks and other water birds

    Science.gov (United States)

    Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.

    1992-01-01

    Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.

  3. Treating experimental data of inverse kinetic method by unitary linear regression analysis

    International Nuclear Information System (INIS)

    Zhao Yusen; Chen Xiaoliang

    2009-01-01

    The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)

  4. Regression Methods for Virtual Metrology of Layer Thickness in Chemical Vapor Deposition

    DEFF Research Database (Denmark)

    Purwins, Hendrik; Barak, Bernd; Nagi, Ahmed

    2014-01-01

    The quality of wafer production in semiconductor manufacturing cannot always be monitored by a costly physical measurement. Instead of measuring a quantity directly, it can be predicted by a regression method (Virtual Metrology). In this paper, a survey on regression methods is given to predict...... average Silicon Nitride cap layer thickness for the Plasma Enhanced Chemical Vapor Deposition (PECVD) dual-layer metal passivation stack process. Process and production equipment Fault Detection and Classification (FDC) data are used as predictor variables. Various variable sets are compared: one most...... algorithm, and Support Vector Regression (SVR). On a test set, SVR outperforms the other methods by a large margin, being more robust towards changes in the production conditions. The method performs better on high-dimensional multivariate input data than on the most predictive variables alone. Process...

  5. Statistical methods in regression and calibration analysis of chromosome aberration data

    International Nuclear Information System (INIS)

    Merkle, W.

    1983-01-01

    The method of iteratively reweighted least squares for the regression analysis of Poisson distributed chromosome aberration data is reviewed in the context of other fit procedures used in the cytogenetic literature. As an application of the resulting regression curves methods for calculating confidence intervals on dose from aberration yield are described and compared, and, for the linear quadratic model a confidence interval is given. Emphasis is placed on the rational interpretation and the limitations of various methods from a statistical point of view. (orig./MG)

  6. A comparative QSAR study on the estrogenic activities of persistent organic pollutants by PLS and SVM

    Directory of Open Access Journals (Sweden)

    Fei Li

    2015-11-01

    Full Text Available Quantitative structure-activity relationships (QSARs were determined using partial least square (PLS and support vector machine (SVM. The predicted values by the final QSAR models were in good agreement with the corresponding experimental values. Chemical estrogenic activities are related to atomic properties (atomic Sanderson electronegativities, van der Waals volumes and polarizabilities. Comparison of the results obtained from two models, the SVM method exhibited better overall performances. Besides, three PLS models were constructed for some specific families based on their chemical structures. These predictive models should be useful to rapidly identify potential estrogenic endocrine disrupting chemicals.

  7. An Introduction to Graphical and Mathematical Methods for Detecting Heteroscedasticity in Linear Regression.

    Science.gov (United States)

    Thompson, Russel L.

    Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…

  8. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    International Nuclear Information System (INIS)

    Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui

    1991-01-01

    A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  9. Comparing treatment effects after adjustment with multivariable Cox proportional hazards regression and propensity score methods

    NARCIS (Netherlands)

    Martens, Edwin P; de Boer, Anthonius; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H

    PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort

  10. Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

    Science.gov (United States)

    de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228

  11. Combining Partial Least Squares and the Gradient-Boosting Method for Soil Property Retrieval Using Visible Near-Infrared Shortwave Infrared Spectra

    Directory of Open Access Journals (Sweden)

    Lanfa Liu

    2017-12-01

    Full Text Available Soil spectroscopy has experienced a tremendous increase in soil property characterisation, and can be used not only in the laboratory but also from the space (imaging spectroscopy. Partial least squares (PLS regression is one of the most common approaches for the calibration of soil properties using soil spectra. Besides functioning as a calibration method, PLS can also be used as a dimension reduction tool, which has scarcely been studied in soil spectroscopy. PLS components retained from high-dimensional spectral data can further be explored with the gradient-boosted decision tree (GBDT method. Three soil sample categories were extracted from the Land Use/Land Cover Area Frame Survey (LUCAS soil library according to the type of land cover (woodland, grassland, and cropland. First, PLS regression and GBDT were separately applied to build the spectroscopic models for soil organic carbon (OC, total nitrogen content (N, and clay for each soil category. Then, PLS-derived components were used as input variables for the GBDT model. The results demonstrate that the combined PLS-GBDT approach has better performance than PLS or GBDT alone. The relative important variables for soil property estimation revealed by the proposed method demonstrated that the PLS method is a useful dimension reduction tool for soil spectra to retain target-related information.

  12. An extension of PPLS-DA for classification and comparison to ordinary PLS-DA.

    Directory of Open Access Journals (Sweden)

    Anna Telaar

    Full Text Available Classification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups. The goal is to find a classification function/rule which assigns each object/patient to a unique group with the greatest possible accuracy (classification error. Especially in gene expression experiments often a lot of variables (genes are measured for only few objects/patients. A suitable approach is the well-known method PLS-DA, which searches for a transformation to a lower dimensional space. Resulting new components are linear combinations of the original variables. An advancement of PLS-DA leads to PPLS-DA, introducing a so called 'power parameter', which is maximized towards the correlation between the components and the group-membership. We introduce an extension of PPLS-DA for optimizing this power parameter towards the final aim, namely towards a minimal classification error. We compare this new extension with the original PPLS-DA and also with the ordinary PLS-DA using simulated and experimental datasets. For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA. A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error. On the contrary, for the data set with strong between-feature collinearity and a low proportion of differentially expressed genes and a large total number of genes, the prediction error of PPLS-DA and the extensions is clearly lower than for PLS-DA. Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.

  13. The Research of Regression Method for Forecasting Monthly Electricity Sales Considering Coupled Multi-factor

    Science.gov (United States)

    Wang, Jiangbo; Liu, Junhui; Li, Tiantian; Yin, Shuo; He, Xinhui

    2018-01-01

    The monthly electricity sales forecasting is a basic work to ensure the safety of the power system. This paper presented a monthly electricity sales forecasting method which comprehensively considers the coupled multi-factors of temperature, economic growth, electric power replacement and business expansion. The mathematical model is constructed by using regression method. The simulation results show that the proposed method is accurate and effective.

  14. Direct integral linear least square regression method for kinetic evaluation of hepatobiliary scintigraphy

    International Nuclear Information System (INIS)

    Shuke, Noriyuki

    1991-01-01

    In hepatobiliary scintigraphy, kinetic model analysis, which provides kinetic parameters like hepatic extraction or excretion rate, have been done for quantitative evaluation of liver function. In this analysis, unknown model parameters are usually determined using nonlinear least square regression method (NLS method) where iterative calculation and initial estimate for unknown parameters are required. As a simple alternative to NLS method, direct integral linear least square regression method (DILS method), which can determine model parameters by a simple calculation without initial estimate, is proposed, and tested the applicability to analysis of hepatobiliary scintigraphy. In order to see whether DILS method could determine model parameters as good as NLS method, or to determine appropriate weight for DILS method, simulated theoretical data based on prefixed parameters were fitted to 1 compartment model using both DILS method with various weightings and NLS method. The parameter values obtained were then compared with prefixed values which were used for data generation. The effect of various weights on the error of parameter estimate was examined, and inverse of time was found to be the best weight to make the error minimum. When using this weight, DILS method could give parameter values close to those obtained by NLS method and both parameter values were very close to prefixed values. With appropriate weighting, the DILS method could provide reliable parameter estimate which is relatively insensitive to the data noise. In conclusion, the DILS method could be used as a simple alternative to NLS method, providing reliable parameter estimate. (author)

  15. 8th International Conference on Partial Least Squares and Related Methods

    CERN Document Server

    Vinzi, Vincenzo; Russolillo, Giorgio; Saporta, Gilbert; Trinchera, Laura

    2016-01-01

    This volume presents state of the art theories, new developments, and important applications of Partial Least Square (PLS) methods. The text begins with the invited communications of current leaders in the field who cover the history of PLS, an overview of methodological issues, and recent advances in regression and multi-block approaches. The rest of the volume comprises selected, reviewed contributions from the 8th International Conference on Partial Least Squares and Related Methods held in Paris, France, on 26-28 May, 2014. They are organized in four coherent sections: 1) new developments in genomics and brain imaging, 2) new and alternative methods for multi-table and path analysis, 3) advances in partial least square regression (PLSR), and 4) partial least square path modeling (PLS-PM) breakthroughs and applications. PLS methods are very versatile methods that are now used in areas as diverse as engineering, life science, sociology, psychology, brain imaging, genomics, and business among both academics ...

  16. A different approach to estimate nonlinear regression model using numerical methods

    Science.gov (United States)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].

  17. Regression dilution bias: tools for correction methods and sample size calculation.

    Science.gov (United States)

    Berglund, Lars

    2012-08-01

    Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.

  18. Estimation of Fine Particulate Matter in Taipei Using Landuse Regression and Bayesian Maximum Entropy Methods

    Directory of Open Access Journals (Sweden)

    Yi-Ming Kuo

    2011-06-01

    Full Text Available Fine airborne particulate matter (PM2.5 has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS, the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME method. The resulting epistemic framework can assimilate knowledge bases including: (a empirical-based spatial trends of PM concentration based on landuse regression, (b the spatio-temporal dependence among PM observation information, and (c site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan from 2005–2007.

  19. Estimation of fine particulate matter in Taipei using landuse regression and bayesian maximum entropy methods.

    Science.gov (United States)

    Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming

    2011-06-01

    Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.

  20. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Science.gov (United States)

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  1. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  2. Cox regression with missing covariate data using a modified partial likelihood method

    DEFF Research Database (Denmark)

    Martinussen, Torben; Holst, Klaus K.; Scheike, Thomas H.

    2016-01-01

    Missing covariate values is a common problem in survival analysis. In this paper we propose a novel method for the Cox regression model that is close to maximum likelihood but avoids the use of the EM-algorithm. It exploits that the observed hazard function is multiplicative in the baseline hazard...

  3. Combining pharmacophore fingerprints and PLS-discriminant analysis for virtual screening and SAR elucidation

    DEFF Research Database (Denmark)

    Askjær, Sune; Langgård, Morten

    2008-01-01

    The criterion of success for the initial stages of a ligand-based drug-discovery project is dual. First, a set of suitable lead compounds has to be identified. Second, a level of a preliminary structure-activity relationship (SAR) of the identified ligands has to be established in order to guide ...... by the protein-binding site known from X-ray complexes. The result of this analysis assists in explaining the efficiency of 2D pharmacophore fingerprints as descriptors in virtual screening....... the lead optimization toward a final drug candidate. This paper presents a combined approach to solving these two problems of ligand-based virtual screening and elucidation of SAR based on interplay between pharmacophore fingerprints and interpretation of PLS-discriminant analysis (PLS-DA) models....... The virtual screening capability of the PLS-DA method is compared to group fusion maximum similarity searching in a test using four graph-based pharmacophore fingerprints over a range of 10 diverse targets. The PLS-DA method was generally found to do better than the Smax method. The GpiDAPH3 and PCH...

  4. Improving the robustness of a partial least squares (PLS) model based on pure component selectivity analysis and range optimization: Case study for the analysis of an etching solution containing hydrogen peroxide

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Youngbok [Department of Chemistry, College of Natural Sciences, Hanyang University Haengdang-Dong, Seoul 133-791 (Korea, Republic of); Chung, Hoeil [Department of Chemistry, College of Natural Sciences, Hanyang University Haengdang-Dong, Seoul 133-791 (Korea, Republic of)]. E-mail: hoeil@hanyang.ac.kr; Arnold, Mark A. [Optical Science and Technology Center and Department of Chemistry, University of Iowa, Iowa City, IA 52242 (United States)

    2006-07-14

    Pure component selectivity analysis (PCSA) was successfully utilized to enhance the robustness of a partial least squares (PLS) model by examining the selectivity of a given component to other components. The samples used in this study were composed of NH{sub 4}OH, H{sub 2}O{sub 2} and H{sub 2}O, a popular etchant solution in the electronic industry. Corresponding near-infrared (NIR) spectra (9000-7500 cm{sup -1}) were used to build PLS models. The selective determination of H{sub 2}O{sub 2} without influences from NH{sub 4}OH and H{sub 2}O was a key issue since its molecular structure is similar to that of H{sub 2}O and NH{sub 4}OH also has a hydroxyl functional group. The best spectral ranges for the determination of NH{sub 4}OH and H{sub 2}O{sub 2} were found with the use of moving window PLS (MW-PLS) and corresponding selectivity was examined by pure component selectivity analysis. The PLS calibration for NH{sub 4}OH was free from interferences from the other components due to the presence of its unique NH absorption bands. Since the spectral variation from H{sub 2}O{sub 2} was broadly overlapping and much less distinct than that from NH{sub 4}OH, the selectivity and prediction performance for the H{sub 2}O{sub 2} calibration were sensitively varied depending on the spectral ranges and number of factors used. PCSA, based on the comparison between regression vectors from PLS and the net analyte signal (NAS), was an effective method to prevent over-fitting of the H{sub 2}O{sub 2} calibration. A robust H{sub 2}O{sub 2} calibration model with minimal interferences from other components was developed. PCSA should be included as a standard method in PLS calibrations where prediction error only is the usual measure of performance.

  5. Convert a low-cost sensor to a colorimeter using an improved regression method

    Science.gov (United States)

    Wu, Yifeng

    2008-01-01

    Closed loop color calibration is a process to maintain consistent color reproduction for color printers. To perform closed loop color calibration, a pre-designed color target should be printed, and automatically measured by a color measuring instrument. A low cost sensor has been embedded to the printer to perform the color measurement. A series of sensor calibration and color conversion methods have been developed. The purpose is to get accurate colorimetric measurement from the data measured by the low cost sensor. In order to get high accuracy colorimetric measurement, we need carefully calibrate the sensor, and minimize all possible errors during the color conversion. After comparing several classical color conversion methods, a regression based color conversion method has been selected. The regression is a powerful method to estimate the color conversion functions. But the main difficulty to use this method is to find an appropriate function to describe the relationship between the input and the output data. In this paper, we propose to use 1D pre-linearization tables to improve the linearity between the input sensor measuring data and the output colorimetric data. Using this method, we can increase the accuracy of the regression method, so as to improve the accuracy of the color conversion.

  6. Comparison of some biased estimation methods (including ordinary subset regression) in the linear model

    Science.gov (United States)

    Sidik, S. M.

    1975-01-01

    Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.

  7. A Simple Linear Regression Method for Quantitative Trait Loci Linkage Analysis With Censored Observations

    OpenAIRE

    Anderson, Carl A.; McRae, Allan F.; Visscher, Peter M.

    2006-01-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using...

  8. PLS and multicollinearity under conditions common in satisfaction studies

    DEFF Research Database (Denmark)

    Nielsen, Rikke; Kristensen, Kai; Eskildsen, Jacob Kjær

    A number of studies have investigated the performance of the PLS path modelling algorithm in the presence of common empirical problems, such as model misspecification, skewness of manifest variables, missing values, and multicollinearity, and they have shown PLS to be quite robust (see e.g. Cassel...... et al., 1999; Kristensen, Eskildsen, 2005). However, most of the studies, including our own, have focused on somewhat simple models with very simple correlation structures. This paper extends the existing knowledge by investigating the effect of varying degrees of multicollinearity on the PLS model...

  9. Pink line syndrome (PLS) in the scleractinian coral Porites lutea

    Digital Repository Service at National Institute of Oceanography (India)

    Ravindran, J.; Raghukumar, C.

    Reef sites Pink line syndrome (PLS) in the scleractinian coral Porites lutea Accepted: 10 May 2002 / Published online: 5 July 2002 C211 Springer-Verlag 2002 We describe here an unreport- ed diseased state of Porites lutea (Milne-Edwards and Haime...)ontheKavarattireefof the Lakshadweep group of is- lands (11C176 N; 71C176E). Pink line syndrome (PLS) causes partial mortality of the coral P. lutea around Kavaratti Island (Fig. 1), and about 10% of colonies were found to be af- fected by PLS. The dead patches were colonized by a...

  10. A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.

    Science.gov (United States)

    Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

    2018-06-12

    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.

  11. Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

    Science.gov (United States)

    Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

    2001-04-01

    Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.

  12. Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood

    Science.gov (United States)

    Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim

    2017-04-01

    Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models

  13. A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

    Science.gov (United States)

    Gillis, Nicolas; Luce, Robert

    2018-01-01

    A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.

  14. Prediction of gas chromatography/electron capture detector retention times of chlorinated pesticides, herbicides, and organohalides by multivariate chemometrics methods

    International Nuclear Information System (INIS)

    Ghasemi, Jahanbakhsh; Asadpour, Saeid; Abdolmaleki, Azizeh

    2007-01-01

    A quantitative structure-retention relationship (QSRR) study, has been carried out on the gas chromatograph/electron capture detector (GC/ECD) system retention times (t R s) of 38 diverse chlorinated pesticides, herbicides, and organohalides by using molecular structural descriptors. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and partial least squares (PLS) regression. The stepwise regression using SPSS was used for the selection of the variables that resulted in the best-fitted models. Appropriate models with low standard errors and high correlation coefficients were obtained. Three types of molecular descriptors including electronic, steric and thermodynamic were used to develop a quantitative relationship between the retention times and structural properties. MLR and PLS analysis has been carried out to derive the best QSRR models. After variables selection, MLR and PLS methods used with leave-one-out cross validation for building the regression models. The predictive quality of the QSRR models were tested for an external prediction set of 12 compounds randomly chosen from 38 compounds. The PLS regression method was used to model the structure-retention relationships, more accurately. However, the results surprisingly showed more or less the same quality for MLR and PLS modeling according to squared regression coefficients R 2 which were 0.951 and 0.948 for MLR and PLS, respectively

  15. Support vector methods for survival analysis: a comparison between ranking and regression approaches.

    Science.gov (United States)

    Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

    2011-10-01

    To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods

  16. Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods

    Directory of Open Access Journals (Sweden)

    Giuliano de Oliveira Freitas

    2013-10-01

    Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.

  17. Using the fuzzy linear regression method to benchmark the energy efficiency of commercial buildings

    International Nuclear Information System (INIS)

    Chung, William

    2012-01-01

    Highlights: ► Fuzzy linear regression method is used for developing benchmarking systems. ► The systems can be used to benchmark energy efficiency of commercial buildings. ► The resulting benchmarking model can be used by public users. ► The resulting benchmarking model can capture the fuzzy nature of input–output data. -- Abstract: Benchmarking systems from a sample of reference buildings need to be developed to conduct benchmarking processes for the energy efficiency of commercial buildings. However, not all benchmarking systems can be adopted by public users (i.e., other non-reference building owners) because of the different methods in developing such systems. An approach for benchmarking the energy efficiency of commercial buildings using statistical regression analysis to normalize other factors, such as management performance, was developed in a previous work. However, the field data given by experts can be regarded as a distribution of possibility. Thus, the previous work may not be adequate to handle such fuzzy input–output data. Consequently, a number of fuzzy structures cannot be fully captured by statistical regression analysis. This present paper proposes the use of fuzzy linear regression analysis to develop a benchmarking process, the resulting model of which can be used by public users. An illustrative example is given as well.

  18. Study (Prediction of Main Pipes Break Rates in Water Distribution Systems Using Intelligent and Regression Methods

    Directory of Open Access Journals (Sweden)

    Massoud Tabesh

    2011-07-01

    Full Text Available Optimum operation of water distribution networks is one of the priorities of sustainable development of water resources, considering the issues of increasing efficiency and decreasing the water losses. One of the key subjects in optimum operational management of water distribution systems is preparing rehabilitation and replacement schemes, prediction of pipes break rate and evaluation of their reliability. Several approaches have been presented in recent years regarding prediction of pipe failure rates which each one requires especial data sets. Deterministic models based on age and deterministic multi variables and stochastic group modeling are examples of the solutions which relate pipe break rates to parameters like age, material and diameters. In this paper besides the mentioned parameters, more factors such as pipe depth and hydraulic pressures are considered as well. Then using multi variable regression method, intelligent approaches (Artificial neural network and neuro fuzzy models and Evolutionary polynomial Regression method (EPR pipe burst rate are predicted. To evaluate the results of different approaches, a case study is carried out in a part ofMashhadwater distribution network. The results show the capability and advantages of ANN and EPR methods to predict pipe break rates, in comparison with neuro fuzzy and multi-variable regression methods.

  19. A heuristic approach using multiple criteria for environmentally benign 3PLs selection

    Science.gov (United States)

    Kongar, Elif

    2005-11-01

    Maintaining competitiveness in an environment where price and quality differences between competing products are disappearing depends on the company's ability to reduce costs and supply time. Timely responses to rapidly changing market conditions require an efficient Supply Chain Management (SCM). Outsourcing logistics to third-party logistics service providers (3PLs) is one commonly used way of increasing the efficiency of logistics operations, while creating a more "core competency focused" business environment. However, this alone may not be sufficient. Due to recent environmental regulations and growing public awareness regarding environmental issues, 3PLs need to be not only efficient but also environmentally benign to maintain companies' competitiveness. Even though an efficient and environmentally benign combination of 3PLs can theoretically be obtained using exhaustive search algorithms, heuristics approaches to the selection process may be superior in terms of the computational complexity. In this paper, a hybrid approach that combines a multiple criteria Genetic Algorithm (GA) with Linear Physical Weighting Algorithm (LPPW) to be used in efficient and environmentally benign 3PLs is proposed. A numerical example is also provided to illustrate the method and the analyses.

  20. Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels.

    Science.gov (United States)

    Kaneko, Hiromasa

    2018-02-26

    To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.

  1. Development of Compressive Failure Strength for Composite Laminate Using Regression Analysis Method

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Myoung Keon [Agency for Defense Development, Daejeon (Korea, Republic of); Lee, Jeong Won; Yoon, Dong Hyun; Kim, Jae Hoon [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

    2016-10-15

    This paper provides the compressive failure strength value of composite laminate developed by using regression analysis method. Composite material in this document is a Carbon/Epoxy unidirection(UD) tape prepreg(Cycom G40-800/5276-1) cured at 350°F(177°C). The operating temperature is –60°F~+200°F(-55°C - +95°C). A total of 56 compression tests were conducted on specimens from eight (8) distinct laminates that were laid up by standard angle layers (0°, +45°, –45° and 90°). The ASTM-D-6484 standard was used for test method. The regression analysis was performed with the response variable being the laminate ultimate fracture strength and the regressor variables being two ply orientations (0° and ±45°)

  2. Development of Compressive Failure Strength for Composite Laminate Using Regression Analysis Method

    International Nuclear Information System (INIS)

    Lee, Myoung Keon; Lee, Jeong Won; Yoon, Dong Hyun; Kim, Jae Hoon

    2016-01-01

    This paper provides the compressive failure strength value of composite laminate developed by using regression analysis method. Composite material in this document is a Carbon/Epoxy unidirection(UD) tape prepreg(Cycom G40-800/5276-1) cured at 350°F(177°C). The operating temperature is –60°F~+200°F(-55°C - +95°C). A total of 56 compression tests were conducted on specimens from eight (8) distinct laminates that were laid up by standard angle layers (0°, +45°, –45° and 90°). The ASTM-D-6484 standard was used for test method. The regression analysis was performed with the response variable being the laminate ultimate fracture strength and the regressor variables being two ply orientations (0° and ±45°)

  3. Partial least squares methods for spectrally estimating lunar soil FeO abundance: A stratified approach to revealing nonlinear effect and qualitative interpretation

    Science.gov (United States)

    Li, Lin

    2008-12-01

    Partial least squares (PLS) regressions were applied to lunar highland and mare soil data characterized by the Lunar Soil Characterization Consortium (LSCC) for spectral estimation of the abundance of lunar soil chemical constituents FeO and Al2O3. The LSCC data set was split into a number of subsets including the total highland, Apollo 16, Apollo 14, and total mare soils, and then PLS was applied to each to investigate the effect of nonlinearity on the performance of the PLS method. The weight-loading vectors resulting from PLS were analyzed to identify mineral species responsible for spectral estimation of the soil chemicals. The results from PLS modeling indicate that the PLS performance depends on the correlation of constituents of interest to their major mineral carriers, and the Apollo 16 soils are responsible for the large errors of FeO and Al2O3 estimates when the soils were modeled along with other types of soils. These large errors are primarily attributed to the degraded correlation FeO to pyroxene for the relatively mature Apollo 16 soils as a result of space weathering and secondary to the interference of olivine. PLS consistently yields very accurate fits to the two soil chemicals when applied to mare soils. Although Al2O3 has no spectrally diagnostic characteristics, this chemical can be predicted for all subset data by PLS modeling at high accuracies because of its correlation to FeO. This correlation is reflected in the symmetry of the PLS weight-loading vectors for FeO and Al2O3, which prove to be very useful for qualitative interpretation of the PLS results. However, this qualitative interpretation of PLS modeling cannot be achieved using principal component regression loading vectors.

  4. The regression-calibration method for fitting generalized linear models with additive measurement error

    OpenAIRE

    James W. Hardin; Henrik Schmeidiche; Raymond J. Carroll

    2003-01-01

    This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on s...

  5. Assessing the performance of variational methods for mixed logistic regression models

    Czech Academy of Sciences Publication Activity Database

    Rijmen, F.; Vomlel, Jiří

    2008-01-01

    Roč. 78, č. 8 (2008), s. 765-779 ISSN 0094-9655 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Mixed models * Logistic regression * Variational methods * Lower bound approximation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.353, year: 2008

  6. Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

    Science.gov (United States)

    Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD

    2018-01-01

    Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.

  7. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    Science.gov (United States)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  8. Correcting for cryptic relatedness by a regression-based genomic control method

    Directory of Open Access Journals (Sweden)

    Yang Yaning

    2009-12-01

    Full Text Available Abstract Background Genomic control (GC method is a useful tool to correct for the cryptic relatedness in population-based association studies. It was originally proposed for correcting for the variance inflation of Cochran-Armitage's additive trend test by using information from unlinked null markers, and was later generalized to be applicable to other tests with the additional requirement that the null markers are matched with the candidate marker in allele frequencies. However, matching allele frequencies limits the number of available null markers and thus limits the applicability of the GC method. On the other hand, errors in genotype/allele frequencies may cause further bias and variance inflation and thereby aggravate the effect of GC correction. Results In this paper, we propose a regression-based GC method using null markers that are not necessarily matched in allele frequencies with the candidate marker. Variation of allele frequencies of the null markers is adjusted by a regression method. Conclusion The proposed method can be readily applied to the Cochran-Armitage's trend tests other than the additive trend test, the Pearson's chi-square test and other robust efficiency tests. Simulation results show that the proposed method is effective in controlling type I error in the presence of population substructure.

  9. A subagging regression method for estimating the qualitative and quantitative state of groundwater

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young

    2017-08-01

    A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.

  10. Impact of regression methods on improved effects of soil structure on soil water retention estimates

    Science.gov (United States)

    Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim

    2015-06-01

    Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.

  11. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    Science.gov (United States)

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  12. Hybrid ARIMAX quantile regression method for forecasting short term electricity consumption in east java

    Science.gov (United States)

    Prastuti, M.; Suhartono; Salehah, NA

    2018-04-01

    The need for energy supply, especially for electricity in Indonesia has been increasing in the last past years. Furthermore, the high electricity usage by people at different times leads to the occurrence of heteroscedasticity issue. Estimate the electricity supply that could fulfilled the community’s need is very important, but the heteroscedasticity issue often made electricity forecasting hard to be done. An accurate forecast of electricity consumptions is one of the key challenges for energy provider to make better resources and service planning and also take control actions in order to balance the electricity supply and demand for community. In this paper, hybrid ARIMAX Quantile Regression (ARIMAX-QR) approach was proposed to predict the short-term electricity consumption in East Java. This method will also be compared to time series regression using RMSE, MAPE, and MdAPE criteria. The data used in this research was the electricity consumption per half-an-hour data during the period of September 2015 to April 2016. The results show that the proposed approach can be a competitive alternative to forecast short-term electricity in East Java. ARIMAX-QR using lag values and dummy variables as predictors yield more accurate prediction in both in-sample and out-sample data. Moreover, both time series regression and ARIMAX-QR methods with addition of lag values as predictor could capture accurately the patterns in the data. Hence, it produces better predictions compared to the models that not use additional lag variables.

  13. A robust and efficient stepwise regression method for building sparse polynomial chaos expansions

    Energy Technology Data Exchange (ETDEWEB)

    Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium); Raisee, Mehrdad [School of Mechanical Engineering, College of Engineering, University of Tehran, P.O. Box: 11155-4563, Tehran (Iran, Islamic Republic of); Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium)

    2017-03-01

    Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.

  14. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    Science.gov (United States)

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  15. Parafac and PLS Applied to Determination of Captopril in Pharmaceutical Preparation and Biological Fluids by Ultraviolet Spectrophotometry

    International Nuclear Information System (INIS)

    Niazi, A.; Ghasemi, N.

    2007-01-01

    A new ultraviolet spectrophotometric method has been developed for the direct qualitative determination of captopril in pharmaceutical preparation and biological fluids such as human plasma and urine samples. The method was accomplished based on parallel factor analysis (PARAFAC) and partial least squares (PLS). The study was carried out in the pH range from 2.0 to 12.8 and with a concentration from 0.70 to 61.50 μg ml -1 of captopril. Multivariate calibration models PLS at various pH and PARAFAC were elaborated from ultraviolet spectra deconvolution and captopril determination. The best models for this system were obtained with PARAFAC and PLS at pH = 2.04 (PLS-PH2). The applications of the method for the determination of real samples were evaluated by analysis of captopril in pharmaceutical preparations and biological (human plasma and urine) fluids with satisfactory results. The accuracy of the method, evaluated through the root mean square error of prediction (RMSEP), was 0.58 for captopril with PARAFAC and 0.67 for captopril with PLS-PH2 model. Acidity constant of captopril at 25 0 C and ionic strength of 0.1 M have also been determined spectrophotometrically. The obtained pK a values of captopril are 3.90 ± 0.05 and 10.03 ± 0.08 for pK a1 and pK a2 , respectively

  16. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    Science.gov (United States)

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  17. Predicting Taxi-Out Time at Congested Airports with Optimization-Based Support Vector Regression Methods

    Directory of Open Access Journals (Sweden)

    Guan Lian

    2018-01-01

    Full Text Available Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.

  18. Method validation using weighted linear regression models for quantification of UV filters in water samples.

    Science.gov (United States)

    da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

    2015-01-01

    This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.

    Science.gov (United States)

    Zhu, Xiaofeng; Suk, Heung-Il; Wang, Li; Lee, Seong-Whan; Shen, Dinggang

    2017-05-01

    In this paper, we focus on joint regression and classification for Alzheimer's disease diagnosis and propose a new feature selection method by embedding the relational information inherent in the observations into a sparse multi-task learning framework. Specifically, the relational information includes three kinds of relationships (such as feature-feature relation, response-response relation, and sample-sample relation), for preserving three kinds of the similarity, such as for the features, the response variables, and the samples, respectively. To conduct feature selection, we first formulate the objective function by imposing these three relational characteristics along with an ℓ 2,1 -norm regularization term, and further propose a computationally efficient algorithm to optimize the proposed objective function. With the dimension-reduced data, we train two support vector regression models to predict the clinical scores of ADAS-Cog and MMSE, respectively, and also a support vector classification model to determine the clinical label. We conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to validate the effectiveness of the proposed method. Our experimental results showed the efficacy of the proposed method in enhancing the performances of both clinical scores prediction and disease status identification, compared to the state-of-the-art methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Estimating Penetration Resistance in Agricultural Soils of Ardabil Plain Using Artificial Neural Network and Regression Methods

    Directory of Open Access Journals (Sweden)

    Gholam Reza Sheykhzadeh

    2017-02-01

    Full Text Available Introduction: Penetration resistance is one of the criteria for evaluating soil compaction. It correlates with several soil properties such as vehicle trafficability, resistance to root penetration, seedling emergence, and soil compaction by farm machinery. Direct measurement of penetration resistance is time consuming and difficult because of high temporal and spatial variability. Therefore, many different regressions and artificial neural network pedotransfer functions have been proposed to estimate penetration resistance from readily available soil variables such as particle size distribution, bulk density (Db and gravimetric water content (θm. The lands of Ardabil Province are one of the main production regions of potato in Iran, thus, obtaining the soil penetration resistance in these regions help with the management of potato production. The objective of this research was to derive pedotransfer functions by using regression and artificial neural network to predict penetration resistance from some soil variations in the agricultural soils of Ardabil plain and to compare the performance of artificial neural network with regression models. Materials and methods: Disturbed and undisturbed soil samples (n= 105 were systematically taken from 0-10 cm soil depth with nearly 3000 m distance in the agricultural lands of the Ardabil plain ((lat 38°15' to 38°40' N, long 48°16' to 48°61' E. The contents of sand, silt and clay (hydrometer method, CaCO3 (titration method, bulk density (cylinder method, particle density (Dp (pychnometer method, organic carbon (wet oxidation method, total porosity(calculating from Db and Dp, saturated (θs and field soil water (θf using the gravimetric method were measured in the laboratory. Mean geometric diameter (dg and standard deviation (σg of soil particles were computed using the percentages of sand, silt and clay. Penetration resistance was measured in situ using cone penetrometer (analog model at 10

  1. Landslide susceptibility mapping on a global scale using the method of logistic regression

    Directory of Open Access Journals (Sweden)

    L. Lin

    2017-08-01

    Full Text Available This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected for model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building the model, 70 % of landslide and nonlandslide points were randomly selected for logistic regression, and the others were used for model validation. To evaluate the accuracy of predictive models, this paper adopts several criteria including a receiver operating characteristic (ROC curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on a global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such a result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.

  2. Development of K-Nearest Neighbour Regression Method in Forecasting River Stream Flow

    Directory of Open Access Journals (Sweden)

    Mohammad Azmi

    2012-07-01

    Full Text Available Different statistical, non-statistical and black-box methods have been used in forecasting processes. Among statistical methods, K-nearest neighbour non-parametric regression method (K-NN due to its natural simplicity and mathematical base is one of the recommended methods for forecasting processes. In this study, K-NN method is explained completely. Besides, development and improvement approaches such as best neighbour estimation, data transformation functions, distance functions and proposed extrapolation method are described. K-NN method in company with its development approaches is used in streamflow forecasting of Zayandeh-Rud Dam upper basin. Comparing between final results of classic K-NN method and modified K-NN (number of neighbour 5, transformation function of Range Scaling, distance function of Mahanalobis and proposed extrapolation method shows that modified K-NN in criteria of goodness of fit, root mean square error, percentage of volume of error and correlation has had performance improvement 45% , 59% and 17% respectively. These results approve necessity of applying mentioned approaches to derive more accurate forecasts.

  3. Comparing the index-flood and multiple-regression methods using L-moments

    Science.gov (United States)

    Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

    In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin

  4. Short term load forecasting technique based on the seasonal exponential adjustment method and the regression model

    International Nuclear Information System (INIS)

    Wu, Jie; Wang, Jianzhou; Lu, Haiyan; Dong, Yao; Lu, Xiaoxiao

    2013-01-01

    Highlights: ► The seasonal and trend items of the data series are forecasted separately. ► Seasonal item in the data series is verified by the Kendall τ correlation testing. ► Different regression models are applied to the trend item forecasting. ► We examine the superiority of the combined models by the quartile value comparison. ► Paired-sample T test is utilized to confirm the superiority of the combined models. - Abstract: For an energy-limited economy system, it is crucial to forecast load demand accurately. This paper devotes to 1-week-ahead daily load forecasting approach in which load demand series are predicted by employing the information of days before being similar to that of the forecast day. As well as in many nonlinear systems, seasonal item and trend item are coexisting in load demand datasets. In this paper, the existing of the seasonal item in the load demand data series is firstly verified according to the Kendall τ correlation testing method. Then in the belief of the separate forecasting to the seasonal item and the trend item would improve the forecasting accuracy, hybrid models by combining seasonal exponential adjustment method (SEAM) with the regression methods are proposed in this paper, where SEAM and the regression models are employed to seasonal and trend items forecasting respectively. Comparisons of the quartile values as well as the mean absolute percentage error values demonstrate this forecasting technique can significantly improve the accuracy though models applied to the trend item forecasting are eleven different ones. This superior performance of this separate forecasting technique is further confirmed by the paired-sample T tests

  5. Evaluating the Performance of Polynomial Regression Method with Different Parameters during Color Characterization

    Directory of Open Access Journals (Sweden)

    Bangyong Sun

    2014-01-01

    Full Text Available The polynomial regression method is employed to calculate the relationship of device color space and CIE color space for color characterization, and the performance of different expressions with specific parameters is evaluated. Firstly, the polynomial equation for color conversion is established and the computation of polynomial coefficients is analysed. And then different forms of polynomial equations are used to calculate the RGB and CMYK’s CIE color values, while the corresponding color errors are compared. At last, an optimal polynomial expression is obtained by analysing several related parameters during color conversion, including polynomial numbers, the degree of polynomial terms, the selection of CIE visual spaces, and the linearization.

  6. Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method

    Science.gov (United States)

    Asavaskulkiet, Krissada

    2018-04-01

    In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.

  7. Real-time prediction of respiratory motion based on local regression methods

    International Nuclear Information System (INIS)

    Ruan, D; Fessler, J A; Balter, J M

    2007-01-01

    Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths

  8. Local regression type methods applied to the study of geophysics and high frequency financial data

    Science.gov (United States)

    Mariani, M. C.; Basu, K.

    2014-09-01

    In this work we applied locally weighted scatterplot smoothing techniques (Lowess/Loess) to Geophysical and high frequency financial data. We first analyze and apply this technique to the California earthquake geological data. A spatial analysis was performed to show that the estimation of the earthquake magnitude at a fixed location is very accurate up to the relative error of 0.01%. We also applied the same method to a high frequency data set arising in the financial sector and obtained similar satisfactory results. The application of this approach to the two different data sets demonstrates that the overall method is accurate and efficient, and the Lowess approach is much more desirable than the Loess method. The previous works studied the time series analysis; in this paper our local regression models perform a spatial analysis for the geophysics data providing different information. For the high frequency data, our models estimate the curve of best fit where data are dependent on time.

  9. Geographically weighted regression based methods for merging satellite and gauge precipitation

    Science.gov (United States)

    Chao, Lijun; Zhang, Ke; Li, Zhijia; Zhu, Yuelong; Wang, Jingfeng; Yu, Zhongbo

    2018-03-01

    Real-time precipitation data with high spatiotemporal resolutions are crucial for accurate hydrological forecasting. To improve the spatial resolution and quality of satellite precipitation, a three-step satellite and gauge precipitation merging method was formulated in this study: (1) bilinear interpolation is first applied to downscale coarser satellite precipitation to a finer resolution (PS); (2) the (mixed) geographically weighted regression methods coupled with a weighting function are then used to estimate biases of PS as functions of gauge observations (PO) and PS; and (3) biases of PS are finally corrected to produce a merged precipitation product. Based on the above framework, eight algorithms, a combination of two geographically weighted regression methods and four weighting functions, are developed to merge CMORPH (CPC MORPHing technique) precipitation with station observations on a daily scale in the Ziwuhe Basin of China. The geographical variables (elevation, slope, aspect, surface roughness, and distance to the coastline) and a meteorological variable (wind speed) were used for merging precipitation to avoid the artificial spatial autocorrelation resulting from traditional interpolation methods. The results show that the combination of the MGWR and BI-square function (MGWR-BI) has the best performance (R = 0.863 and RMSE = 7.273 mm/day) among the eight algorithms. The MGWR-BI algorithm was then applied to produce hourly merged precipitation product. Compared to the original CMORPH product (R = 0.208 and RMSE = 1.208 mm/hr), the quality of the merged data is significantly higher (R = 0.724 and RMSE = 0.706 mm/hr). The developed merging method not only improves the spatial resolution and quality of the satellite product but also is easy to implement, which is valuable for hydrological modeling and other applications.

  10. Prediction Methods in Science and Technology

    DEFF Research Database (Denmark)

    Høskuldsson, Agnar

    Presents the H-principle, the Heisenberg modelling principle. General properties of the Heisenberg modelling procedure is developed. The theory is applied to principal component analysis and linear regression analysis. It is shown that the H-principle leads to PLS regression in case the task...... is linear regression analysis. The book contains different methods to find the dimensions of linear models, to carry out sensitivity analysis in latent structure models, variable selection methods and presentation of results from analysis....

  11. Cellulose I crystallinity determination using FT-Raman spectroscopy : univariate and multivariate methods

    Science.gov (United States)

    Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph

    2010-01-01

    Two new methods based on FT–Raman spectroscopy, one simple, based on band intensity ratio, and the other using a partial least squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in cellulose I samples was determined based on univariate regression that was first developed using the Raman band...

  12. QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

    Directory of Open Access Journals (Sweden)

    Adi Syahputra

    2014-03-01

    Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.

  13. Nonparametric Methods in Astronomy: Think, Regress, Observe—Pick Any Three

    Science.gov (United States)

    Steinhardt, Charles L.; Jermyn, Adam S.

    2018-02-01

    Telescopes are much more expensive than astronomers, so it is essential to minimize required sample sizes by using the most data-efficient statistical methods possible. However, the most commonly used model-independent techniques for finding the relationship between two variables in astronomy are flawed. In the worst case they can lead without warning to subtly yet catastrophically wrong results, and even in the best case they require more data than necessary. Unfortunately, there is no single best technique for nonparametric regression. Instead, we provide a guide for how astronomers can choose the best method for their specific problem and provide a python library with both wrappers for the most useful existing algorithms and implementations of two new algorithms developed here.

  14. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    Science.gov (United States)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  15. Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs

    Directory of Open Access Journals (Sweden)

    Faqir Muhammad

    2007-01-01

    Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.

  16. Robust Methods for Moderation Analysis with a Two-Level Regression Model.

    Science.gov (United States)

    Yang, Miao; Yuan, Ke-Hai

    2016-01-01

    Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.

  17. Applications of Monte Carlo method to nonlinear regression of rheological data

    Science.gov (United States)

    Kim, Sangmo; Lee, Junghaeng; Kim, Sihyun; Cho, Kwang Soo

    2018-02-01

    In rheological study, it is often to determine the parameters of rheological models from experimental data. Since both rheological data and values of the parameters vary in logarithmic scale and the number of the parameters is quite large, conventional method of nonlinear regression such as Levenberg-Marquardt (LM) method is usually ineffective. The gradient-based method such as LM is apt to be caught in local minima which give unphysical values of the parameters whenever the initial guess of the parameters is far from the global optimum. Although this problem could be solved by simulated annealing (SA), the Monte Carlo (MC) method needs adjustable parameter which could be determined in ad hoc manner. We suggest a simplified version of SA, a kind of MC methods which results in effective values of the parameters of most complicated rheological models such as the Carreau-Yasuda model of steady shear viscosity, discrete relaxation spectrum and zero-shear viscosity as a function of concentration and molecular weight.

  18. Logistic Regression and Path Analysis Method to Analyze Factors influencing Students’ Achievement

    Science.gov (United States)

    Noeryanti, N.; Suryowati, K.; Setyawan, Y.; Aulia, R. R.

    2018-04-01

    Students' academic achievement cannot be separated from the influence of two factors namely internal and external factors. The first factors of the student (internal factors) consist of intelligence (X1), health (X2), interest (X3), and motivation of students (X4). The external factors consist of family environment (X5), school environment (X6), and society environment (X7). The objects of this research are eighth grade students of the school year 2016/2017 at SMPN 1 Jiwan Madiun sampled by using simple random sampling. Primary data are obtained by distributing questionnaires. The method used in this study is binary logistic regression analysis that aims to identify internal and external factors that affect student’s achievement and how the trends of them. Path Analysis was used to determine the factors that influence directly, indirectly or totally on student’s achievement. Based on the results of binary logistic regression, variables that affect student’s achievement are interest and motivation. And based on the results obtained by path analysis, factors that have a direct impact on student’s achievement are students’ interest (59%) and students’ motivation (27%). While the factors that have indirect influences on students’ achievement, are family environment (97%) and school environment (37).

  19. Modelling Status Food Security Households Disease Sufferers Pulmonary Tuberculosis Uses the Method Regression Logistics Binary

    Science.gov (United States)

    Wulandari, S. P.; Salamah, M.; Rositawati, A. F. D.

    2018-04-01

    Food security is the condition where the food fulfilment is managed well for the country till the individual. Indonesia is one of the country which has the commitment to create the food security becomes main priority. However, the food necessity becomes common thing means that it doesn’t care about nutrient standard and the health condition of family member, so in the fulfilment of food necessity also has to consider the disease suffered by the family member, one of them is pulmonary tuberculosa. From that reasons, this research is conducted to know the factors which influence on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya by using binary logistic regression method. The analysis result by using binary logistic regression shows that the variables wife latest education, house density and spacious house ventilation significantly affect on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya, where the wife education level is University/equivalent, the house density is eligible or 8 m2/person and spacious house ventilation 10% of the floor area has the opportunity to become food secure households amounted to 0.911089. While the chance of becoming food insecure households amounted to 0.088911. The model household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya has been conformable, and the overall percentages of those classifications are at 71.8%.

  20. TEMPERATURE PREDICTION IN 3013 CONTAINERS IN K AREA MATERIAL STORAGE (KAMS) FACILITY USING REGRESSION METHODS

    International Nuclear Information System (INIS)

    Gupta, N

    2008-01-01

    3013 containers are designed in accordance with the DOE-STD-3013-2004. These containers are qualified to store plutonium (Pu) bearing materials such as PuO2 for 50 years. DOT shipping packages such as the 9975 are used to store the 3013 containers in the K-Area Material Storage (KAMS) facility at Savannah River Site (SRS). DOE-STD-3013-2004 requires that a comprehensive surveillance program be set up to ensure that the 3013 container design parameters are not violated during the long term storage. To ensure structural integrity of the 3013 containers, thermal analyses using finite element models were performed to predict the contents and component temperatures for different but well defined parameters such as storage ambient temperature, PuO 2 density, fill heights, weights, and thermal loading. Interpolation is normally used to calculate temperatures if the actual parameter values are different from the analyzed values. A statistical analysis technique using regression methods is proposed to develop simple polynomial relations to predict temperatures for the actual parameter values found in the containers. The analysis shows that regression analysis is a powerful tool to develop simple relations to assess component temperatures

  1. Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test

    KAUST Repository

    Madakyaru, Muddu

    2017-02-16

    Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.

  2. Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test

    KAUST Repository

    Madakyaru, Muddu; Harrou, Fouzi; Sun, Ying

    2017-01-01

    Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.

  3. Multi-step polynomial regression method to model and forecast malaria incidence.

    Directory of Open Access Journals (Sweden)

    Chandrajit Chatterjee

    Full Text Available Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR of malaria; a smaller time series data (deaths due to Plasmodium vivax of one year; and spatial data (zonal distribution of P. vivax deaths for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city

  4. A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections

    Science.gov (United States)

    Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.

    2014-01-01

    A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.

  5. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

    Science.gov (United States)

    Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

    2014-08-01

    As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.

  6. A dynamic particle filter-support vector regression method for reliability prediction

    International Nuclear Information System (INIS)

    Wei, Zhao; Tao, Tao; ZhuoShu, Ding; Zio, Enrico

    2013-01-01

    Support vector regression (SVR) has been applied to time series prediction and some works have demonstrated the feasibility of its use to forecast system reliability. For accuracy of reliability forecasting, the selection of SVR's parameters is important. The existing research works on SVR's parameters selection divide the example dataset into training and test subsets, and tune the parameters on the training data. However, these fixed parameters can lead to poor prediction capabilities if the data of the test subset differ significantly from those of training. Differently, the novel method proposed in this paper uses particle filtering to estimate the SVR model parameters according to the whole measurement sequence up to the last observation instance. By treating the SVR training model as the observation equation of a particle filter, our method allows updating the SVR model parameters dynamically when a new observation comes. Because of the adaptability of the parameters to dynamic data pattern, the new PF–SVR method has superior prediction performance over that of standard SVR. Four application results show that PF–SVR is more robust than SVR to the decrease of the number of training data and the change of initial SVR parameter values. Also, even if there are trends in the test data different from those in the training data, the method can capture the changes, correct the SVR parameters and obtain good predictions. -- Highlights: •A dynamic PF–SVR method is proposed to predict the system reliability. •The method can adjust the SVR parameters according to the change of data. •The method is robust to the size of training data and initial parameter values. •Some cases based on both artificial and real data are studied. •PF–SVR shows superior prediction performance over standard SVR

  7. Statistical learning method in regression analysis of simulated positron spectral data

    International Nuclear Information System (INIS)

    Avdic, S. Dz.

    2005-01-01

    Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)

  8. Measurement of process variables in solid-state fermentation of wheat straw using FT-NIR spectroscopy and synergy interval PLS algorithm

    Science.gov (United States)

    Jiang, Hui; Liu, Guohai; Mei, Congli; Yu, Shuang; Xiao, Xiahong; Ding, Yuhan

    2012-11-01

    The feasibility of rapid determination of the process variables (i.e. pH and moisture content) in solid-state fermentation (SSF) of wheat straw using Fourier transform near infrared (FT-NIR) spectroscopy was studied. Synergy interval partial least squares (siPLS) algorithm was implemented to calibrate regression model. The number of PLS factors and the number of subintervals were optimized simultaneously by cross-validation. The performance of the prediction model was evaluated according to the root mean square error of cross-validation (RMSECV), the root mean square error of prediction (RMSEP) and the correlation coefficient (R). The measurement results of the optimal model were obtained as follows: RMSECV = 0.0776, Rc = 0.9777, RMSEP = 0.0963, and Rp = 0.9686 for pH model; RMSECV = 1.3544% w/w, Rc = 0.8871, RMSEP = 1.4946% w/w, and Rp = 0.8684 for moisture content model. Finally, compared with classic PLS and iPLS models, the siPLS model revealed its superior performance. The overall results demonstrate that FT-NIR spectroscopy combined with siPLS algorithm can be used to measure process variables in solid-state fermentation of wheat straw, and NIR spectroscopy technique has a potential to be utilized in SSF industry.

  9. The crux of the method: assumptions in ordinary least squares and logistic regression.

    Science.gov (United States)

    Long, Rebecca G

    2008-10-01

    Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.

  10. Calibration sets selection strategy for the construction of robust PLS models for prediction of biodiesel/diesel blends physico-chemical properties using NIR spectroscopy

    Science.gov (United States)

    Palou, Anna; Miró, Aira; Blanco, Marcelo; Larraz, Rafael; Gómez, José Francisco; Martínez, Teresa; González, Josep Maria; Alcalà, Manel

    2017-06-01

    Even when the feasibility of using near infrared (NIR) spectroscopy combined with partial least squares (PLS) regression for prediction of physico-chemical properties of biodiesel/diesel blends has been widely demonstrated, inclusion in the calibration sets of the whole variability of diesel samples from diverse production origins still remains as an important challenge when constructing the models. This work presents a useful strategy for the systematic selection of calibration sets of samples of biodiesel/diesel blends from diverse origins, based on a binary code, principal components analysis (PCA) and the Kennard-Stones algorithm. Results show that using this methodology the models can keep their robustness over time. PLS calculations have been done using a specialized chemometric software as well as the software of the NIR instrument installed in plant, and both produced RMSEP under reproducibility values of the reference methods. The models have been proved for on-line simultaneous determination of seven properties: density, cetane index, fatty acid methyl esters (FAME) content, cloud point, boiling point at 95% of recovery, flash point and sulphur.

  11. Mixture quantification using PLS in plastic scintillation measurements

    Energy Technology Data Exchange (ETDEWEB)

    Bagan, H.; Tarancon, A.; Rauret, G. [Departament de Quimica Analitica, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona (Spain); Garcia, J.F., E-mail: jfgarcia@ub.ed [Departament de Quimica Analitica, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona (Spain)

    2011-06-15

    This article reports the capability of plastic scintillation (PS) combined with multivariate calibration (Partial least squares; PLS) to detect and quantify alpha and beta emitters in mixtures. While several attempts have been made with this purpose in mind using liquid scintillation (LS), no attempt was done using PS that has the great advantage of not producing mixed waste after the measurements are performed. Following this objective, ternary mixtures of alpha and beta emitters ({sup 241}Am, {sup 137}Cs and {sup 90}Sr/{sup 90}Y) have been quantified. Procedure optimisation has evaluated the use of the net spectra or the sample spectra, the inclusion of different spectra obtained at different values of the Pulse Shape Analysis parameter and the application of the PLS1 or PLS2 algorithms. The conclusions show that the use of PS+PLS2 applied to the sample spectra, without the use of any pulse shape discrimination, allows quantification of the activities with relative errors less than 10% in most of the cases. This procedure not only allows quantification of mixtures but also reduces measurement time (no blanks are required) and the application of this procedure does not require detectors that include the pulse shape analysis parameter.

  12. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    Science.gov (United States)

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  13. Non-invasive diagnostic methods for atherosclerosis and use in assessing progression and regression in hypercholesterolemia

    International Nuclear Information System (INIS)

    Tsushima, Motoo; Fujii, Shigeki; Yutani, Chikao; Yamamoto, Akira; Naitoh, Hiroaki.

    1990-01-01

    We evaluated the wall thickening and stenosis rate (ASI), the calcification rate (ACI), and the wall thickening and calcification stenosis rate (SCI) of the lower abdominal aorta calculated by the 12 sector method from simple or enhanced computed tomography. The intra-observer variation of the calculation of ASI was 5.7% and that of ACI was 2.4%. In 9 patients who underwent an autopsy examination, ACI was significantly correlated with the rate of the calcification dimension to the whole objective area of the abdominal aorta (r=0.856, p<0.01). However, there were no correlations between ASI and the surface involvement or the atherosclerotic index obtained by the point-counting method of the autopsy materials. In the analysis of 40 patients with atherosclerotic vascular diseases, ASI and ACI were also highly correlated with the percentage volume of the arterial wall in relation to the whole volume of the observed artery (r=0.852, p<0.0001) and also the percentage calcification volume (r=0.913, p<0.0001) calculated by the computed method, respectively. The percentage of atherosclerotic vascular diseases increased in the group of both high ASI (over 10%) and high ACI (over 20%). We used SCI as a reliable index when the progression and regression of atherosclerosis was considered. Among patients of hypercholesterolemia consisting of 15 with familial hypercholesterolemia (FH) and 6 non-FH patients, the change of SCI (d-SCI) was significantly correlated with the change of total cholesterol concentration (d-TC) after the treatment (r=0.466, p<0.05) and the change of the right Achilles' tendon thickening (d-ATT) was also correlated with d-TC (r=0.634, p<0.005). However, no correlation between d-SCI and d-ATT was observed. In conclusion, CT indices of atherosclerosis were useful as a noninvasive quantitative diagnostic method and we were able to use them to assess the progression and regression of atherosclerosis. (author)

  14. Reflexion on linear regression trip production modelling method for ensuring good model quality

    Science.gov (United States)

    Suprayitno, Hitapriya; Ratnasari, Vita

    2017-11-01

    Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.

  15. ON THE EFFECTS OF THE PRESENCE AND METHODS OF THE ELIMINATION HETEROSCEDASTICITY AND AUTOCORRELATION IN THE REGRESSION MODEL

    Directory of Open Access Journals (Sweden)

    Nina L. Timofeeva

    2014-01-01

    Full Text Available The article presents the methodological and technical bases for the creation of regression models that adequately reflect reality. The focus is on methods of removing residual autocorrelation in models. Algorithms eliminating heteroscedasticity and autocorrelation of the regression model residuals: reweighted least squares method, the method of Cochran-Orkutta are given. A model of "pure" regression is build, as well as to compare the effect on the dependent variable of the different explanatory variables when the latter are expressed in different units, a standardized form of the regression equation. The scheme of abatement techniques of heteroskedasticity and autocorrelation for the creation of regression models specific to the social and cultural sphere is developed.

  16. Using a Regression Method for Estimating Performance in a Rapid Serial Visual Presentation Target-Detection Task

    Science.gov (United States)

    2017-12-01

    Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR and FAR. Analysis was...distribution is unlimited. 8 Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR...stimuli. Simulations show that this regression method results in an unbiased and accurate estimate of target detection performance. The regression

  17. Regression methods to investigate the relationship between facial measurements and widths of the maxillary anterior teeth.

    Science.gov (United States)

    Isa, Zakiah Mohd; Tawfiq, Omar Farouq; Noor, Norliza Mohd; Shamsudheen, Mohd Iqbal; Rijal, Omar Mohd

    2010-03-01

    In rehabilitating edentulous patients, selecting appropriately sized teeth in the absence of preextraction records is problematic. The purpose of this study was to investigate the relationships between some facial dimensions and widths of the maxillary anterior teeth to potentially provide a guide for tooth selection. Sixty full dentate Malaysian adults (18-36 years) representing 2 ethnic groups (Malay and Chinese), with well aligned maxillary anterior teeth and minimal attrition, participated in this study. Standardized digital images of the face, viewed frontally, were recorded. Using image analyzing software, the images were used to determine the interpupillary distance (IPD), inner canthal distance (ICD), and interalar width (IA). Widths of the 6 maxillary anterior teeth were measured directly from casts of the subjects using digital calipers. Regression analyses were conducted to measure the strength of the associations between the variables (alpha=.10). The means (standard deviations) of IPD, IA, and ICD of the subjects were 62.28 (2.47), 39.36 (3.12), and 34.36 (2.15) mm, respectively. The mesiodistal diameters of the maxillary central incisors, lateral incisors, and canines were 8.54 (0.50), 7.09 (0.48), and 7.94 (0.40) mm, respectively. The width of the central incisors was highly correlated to the IPD (r=0.99), while the widths of the lateral incisors and canines were highly correlated to a combination of IPD and IA (r=0.99 and 0.94, respectively). Using regression methods, the widths of the anterior teeth within the population tested may be predicted by a combination of the facial dimensions studied. (c) 2010 The Editorial Council of the Journal of Prosthetic Dentistry. Published by Mosby, Inc. All rights reserved.

  18. Exact estimation of biodiesel cetane number (CN) from its fatty acid methyl esters (FAMEs) profile using partial least square (PLS) adapted by artificial neural network (ANN)

    International Nuclear Information System (INIS)

    Hosseinpour, Soleiman; Aghbashlo, Mortaza; Tabatabaei, Meisam; Khalife, Esmail

    2016-01-01

    Highlights: • Estimating the biodiesel CN from its FAMEs profile using ANN-based PLS approach. • Comparing the capability of ANN-adapted PLS approach with the standard PLS model. • Exact prediction of biodiesel CN from it FAMEs profile using ANN-based PLS method. • Developing an easy-to-use software using ANN-PLS model for computing the biodiesel CN. - Abstract: Cetane number (CN) is among the most important properties of biodiesel because it quantifies combustion speed or in better words, ignition quality. Experimental measurement of biodiesel CN is rather laborious and expensive. However, the high proportionality of biodiesel fatty acid methyl esters (FAMEs) profile with its CN is very appealing to develop straightforward and inexpensive computerized tools for biodiesel CN estimation. Unfortunately, correlating the chemical structure of biodiesel to its CN using conventional statistical and mathematical approaches is very difficult. To solve this issue, partial least square (PLS) adapted by artificial neural network (ANN) was introduced and examined herein as an innovative approach for the exact estimation of biodiesel CN from its FAMEs profile. In the proposed approach, ANN paradigm was used for modeling the inner relation between the input and the output PLS score vectors. In addition, the capability of the developed method in predicting the biodiesel CN was compared with the basal PLS method. The accuracy of the developed approaches for computing the biodiesel CN was assessed using three statistical criteria, i.e., coefficient of determination (R"2), mean-squared error (MSE), and percentage error (PE). The ANN-adapted PLS method predicted the biodiesel CN with an R"2 value higher than 0.99 demonstrating the fidelity of the developed model over the classical PLS method with a markedly lower R"2 value of about 0.85. In order to facilitate the use of the proposed model, an easy-to-use computer program was also developed on the basis of ANN-adapted PLS

  19. Different frontal involvement in ALS and PLS revealed by Stroop event-related potentials and reaction times

    Directory of Open Access Journals (Sweden)

    Ninfa eAmato

    2013-12-01

    Full Text Available BACKGROUND: A growing body of evidence suggests a link between cognitive and pathological changes in amyotrophic lateral sclerosis (ALS and in frontotemporal lobar dementia (FTLD. Cognitive deficits have been investigated much less extensively in primary lateral sclerosis (PLS than in ALS. OBJECTIVE: to investigate bioelectrical activity to Stroop test, assessing frontal function, in ALS, PLS and control groups. METHODS: 32 non-demented ALS patients, 10 non-demented PLS patients and 27 healthy subjects were included. Twenty-nine electroencephalography (EEG channels with binaural reference were recorded during covert Stroop task performance, involving mental discrimination of the stimuli and not vocal or motor response. Group effects on event related potentials (ERPs latency were analyzed using statistical multivariate analysis. Topographic analysis was performed using low resolution brain electromagnetic tomography (LORETA. RESULTS: ALS patients committed more errors in the execution of the task but they were not slower, whereas PLS patients did not show reduced accuracy, despite a slowing of reaction times (RTs. The main ERP components were delayed in ALS, but not in PLS, compared with controls. Moreover, RTs speed but not ERP latency correlated with clinical scores. ALS had decreased frontotemporal activity in the P2, P3 and N4 time windows compared to controls. CONCLUSION: These findings suggest a different pattern of psychophysiological involvement in ALS compared with PLS. The former is increasingly recognized to be a multisystems disorder, with a spectrum of executive and behavioural impairments reflecting frontotemporal dysfunction. The latter seems to mainly involve the motor system, with largely spared cognitive functions. Moreover, our results suggest that the covert version of the Stroop task used in the present study, may be useful to assess cognitive state in the very advanced stage of the disease, when other cognitive tasks are not

  20. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  1. Whole-genome regression and prediction methods applied to plant and animal breeding

    NARCIS (Netherlands)

    Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of

  2. Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

    Science.gov (United States)

    Prahutama, Alan; Sudarno

    2018-05-01

    The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).

  3. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  4. Beam property studies in the PLS diagnostic beamline

    CERN Document Server

    Ko, I S; Seon, D K; Kim, C B; Lee, T Y

    1999-01-01

    A diagnostic beamline has been operated in the Pohang Light Source (PLS) storage ring for the diagnostics of electron and photon beam properties. It consists of two 1:1 imaging systems: a visible-light imaging system and a soft X-ray imaging system. We have measured the transverse and the longitudinal structures of beams by using a streak camera to obtain a visible image. Accurate transverse beam size have been measured to be 186 mu m horizontally and 43.1 mu m vertically by using soft X-ray images with minimum diffraction errors. The corresponding emittances are 11.7 nm-rad horizontally and 0.59 nm-rad vertically. By comparing the measured data with the design values, we confirmed that the PLS storage ring has reached its designed performance within an error of 3.3 % in the transverse direction.

  5. Group-wise partial least square regression

    NARCIS (Netherlands)

    Camacho, José; Saccenti, Edoardo

    2018-01-01

    This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are

  6. PLS-Prediction and Confirmation of Hydrojuglone Glucoside as the Antitrypanosomal Constituent of Juglans Spp.

    Directory of Open Access Journals (Sweden)

    Therese Ellendorff

    2015-05-01

    Full Text Available Naphthoquinones (NQs occur naturally in a large variety of plants. Several NQs are highly active against protozoans, amongst them the causative pathogens of neglected tropical diseases such as human African trypanosomiasis (sleeping sickness, Chagas disease and leishmaniasis. Prominent NQ-producing plants can be found among Juglans spp. (Juglandaceae with juglone derivatives as known constituents. In this study, 36 highly variable extracts were prepared from different plant parts of J. regia, J. cinerea and J. nigra. For all extracts, antiprotozoal activity was determined against the protozoans Trypanosoma cruzi, T. brucei rhodesiense and Leishmania donovani. In addition, an LC-MS fingerprint was recorded for each extract. With each extract’s fingerprint and the data on in vitro growth inhibitory activity against T. brucei rhodesiense a Partial Least Squares (PLS regression model was calculated in order to obtain an indication of compounds responsible for the differences in bioactivity between the 36 extracts. By means of PLS, hydrojuglone glucoside was predicted as an active compound against T. brucei and consequently isolated and tested in vitro. In fact, the pure compound showed activity against T. brucei at a significantly lower cytotoxicity towards mammalian cells than established antiprotozoal NQs such as lapachol.

  7. Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

    Science.gov (United States)

    Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

    2014-08-01

    Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  9. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    International Nuclear Information System (INIS)

    Balabin, Roman M.; Smirnov, Sergey V.

    2011-01-01

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  10. A SOCIOLOGICAL ANALYSIS OF THE CHILDBEARING COEFFICIENT IN THE ALTAI REGION BASED ON METHOD OF FUZZY LINEAR REGRESSION

    Directory of Open Access Journals (Sweden)

    Sergei Vladimirovich Varaksin

    2017-06-01

    Full Text Available Purpose. Construction of a mathematical model of the dynamics of childbearing change in the Altai region in 2000–2016, analysis of the dynamics of changes in birth rates for multiple age categories of women of childbearing age. Methodology. A auxiliary analysis element is the construction of linear mathematical models of the dynamics of childbearing by using fuzzy linear regression method based on fuzzy numbers. Fuzzy linear regression is considered as an alternative to standard statistical linear regression for short time series and unknown distribution law. The parameters of fuzzy linear and standard statistical regressions for childbearing time series were defined with using the built in language MatLab algorithm. Method of fuzzy linear regression is not used in sociological researches yet. Results. There are made the conclusions about the socio-demographic changes in society, the high efficiency of the demographic policy of the leadership of the region and the country, and the applicability of the method of fuzzy linear regression for sociological analysis.

  11. [Determination of Cu in Shell of Preserved Egg by LIBS Coupled with PLS].

    Science.gov (United States)

    Hu, Hui-qin; Xu, Xue-hong; Liu, Mu-hua; Tu, Jian-ping; Huang, Le; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; Yang, Ping

    2015-12-01

    In this work, the content of copper in the shell of preserved eggs were determined directly by Laser induced breakdown spectroscopy (LIBS), and the characteristics lines of Cu was obtained. The samples of eggshell were pretreated by acid wet digestion, and the real content of Cu was obtained by atomic absorption spectrophotometer (AAS). Due to the test precision and accuracy of LIBS was influenced by a serious of factors, for example, the complex matrix effect of sample, the enviro nment noise, the system noise of the instrument, the stability of laser energy and so on. And the conventional unvariate linear calibration curve between LIBS intensity and content of element of sample, such as by use of Schiebe G-Lomakin equation, can not meet the requirement of quantitative analysis. In account of that, a kind of multivariate calibration method is needed. In this work, the data of LIBS spectra were processed by partial least squares (PLS), the precision and accuracy of PLS model were compared by different smoothing treatment and five pretreatment methods. The result showed that the correlation coefficient and the accuracy of the PLS model were improved, and the root mean square error and the average relative error were reduced effectively by 11 point smoothing with Multiplicative scatter correction (MSC) pretreatment. The results of the study show that, heavy metal Cu in preserved egg shells can be direct detected accurately by laser induced breakdown spectroscopy, and the next step batch tests will been conducted to find out the relationship of heavy metal Cu content in the preserved egg between the eggshell, egg white and egg yolk. And the goal of the contents of heavy metals in the egg white, egg yolk can be knew through determinate the eggshell by the LIBS can be achieved, to provide new method for rapid non-destructive testing technology for quality and satety of agricultural products.

  12. Desenvolvimento de Modelos de Regressão Multivariada para a Quantificação de Benzoilmetronidazol na Presença de seus Produtos de Degradação por Espectroscopia no Infravermelho Próximo

    Directory of Open Access Journals (Sweden)

    Willian Ricardo da Rosa de Almeida

    2015-12-01

    Full Text Available Benzoyl metronidazole (BMZ is a drug with antiparasitic and antibacterial activity available in the form of pediatric suspensions. The BMZ main degradation products are metronidazole and benzoic acid, and there are no reports in the literature on the determination of BMZ in the presence of its degradation products using near infrared spectroscopy. Therefore, in this study a method for determining the content of BMZ pharmaceutical ingredient in the presence of its main degradation products by near infrared spectroscopy associated with multivariate calibration were to develop. Regression with variable selection methods such as partial least squares regression for interval (iPLS and partial least squares regression for synergism intervals (siPLS were applied in order to select spectral regions that produce models with smaller errors. The best model using the iPLS algorithm was obtained when the spectrum was divided into 12 sub-intervals and select a period 11 (RSEP% = 1.37. Once the spectrum has been divided into 16 intervals and combined subintervals 9, 13 and 18 yielded the best model for siPLS algorithm (RSEP = 1.30%. The proposed method can be considered selective; it allows determining the BMZ in the presence of its degradation products. DOI: http://dx.doi.org/10.17807/orbital.v7i4.741

  13. Thermoluminescence dating of chinese porcelain using a regression method of saturating exponential in pre-dose technique

    International Nuclear Information System (INIS)

    Wang Weida; Xia Junding; Zhou Zhixin; Leung, P.L.

    2001-01-01

    Thermoluminescence (TL) dating using a regression method of saturating exponential in pre-dose technique was described. 23 porcelain samples from past dynasties of China were dated by this method. The results show that the TL ages are in reasonable agreement with archaeological dates within a standard deviation of 27%. Such error can be accepted in porcelain dating

  14. The analysis of survival data in nephrology: basic concepts and methods of Cox regression

    NARCIS (Netherlands)

    van Dijk, Paul C.; Jager, Kitty J.; Zwinderman, Aeilko H.; Zoccali, Carmine; Dekker, Friedo W.

    2008-01-01

    How much does the survival of one group differ from the survival of another group? How do differences in age in these two groups affect such a comparison? To obtain a quantity to compare the survival of different patient groups and to account for confounding effects, a multiple regression technique

  15. Estimating traffic volume on Wyoming low volume roads using linear and logistic regression methods

    Directory of Open Access Journals (Sweden)

    Dick Apronti

    2016-12-01

    Full Text Available Traffic volume is an important parameter in most transportation planning applications. Low volume roads make up about 69% of road miles in the United States. Estimating traffic on the low volume roads is a cost-effective alternative to taking traffic counts. This is because traditional traffic counts are expensive and impractical for low priority roads. The purpose of this paper is to present the development of two alternative means of cost-effectively estimating traffic volumes for low volume roads in Wyoming and to make recommendations for their implementation. The study methodology involves reviewing existing studies, identifying data sources, and carrying out the model development. The utility of the models developed were then verified by comparing actual traffic volumes to those predicted by the model. The study resulted in two regression models that are inexpensive and easy to implement. The first regression model was a linear regression model that utilized pavement type, access to highways, predominant land use types, and population to estimate traffic volume. In verifying the model, an R2 value of 0.64 and a root mean square error of 73.4% were obtained. The second model was a logistic regression model that identified the level of traffic on roads using five thresholds or levels. The logistic regression model was verified by estimating traffic volume thresholds and determining the percentage of roads that were accurately classified as belonging to the given thresholds. For the five thresholds, the percentage of roads classified correctly ranged from 79% to 88%. In conclusion, the verification of the models indicated both model types to be useful for accurate and cost-effective estimation of traffic volumes for low volume Wyoming roads. The models developed were recommended for use in traffic volume estimations for low volume roads in pavement management and environmental impact assessment studies.

  16. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  17. Radioligand assays - methods and applications. IV. Uniform regression of hyperbolic and linear radioimmunoassay calibration curves

    Energy Technology Data Exchange (ETDEWEB)

    Keilacker, H; Becker, G; Ziegler, M; Gottschling, H D [Zentralinstitut fuer Diabetes, Karlsburg (German Democratic Republic)

    1980-10-01

    In order to handle all types of radioimmunoassay (RIA) calibration curves obtained in the authors' laboratory in the same way, they tried to find a non-linear expression for their regression which allows calibration curves with different degrees of curvature to be fitted. Considering the two boundary cases of the incubation protocol they derived a hyperbolic inverse regression function: x = a/sub 1/y + a/sub 0/ + asub(-1)y/sup -1/, where x is the total concentration of antigen, asub(i) are constants, and y is the specifically bound radioactivity. An RIA evaluation procedure based on this function is described providing a fitted inverse RIA calibration curve and some statistical quality parameters. The latter are of an order which is normal for RIA systems. There is an excellent agreement between fitted and experimentally obtained calibration curves having a different degree of curvature.

  18. Predicting Charging Time of Battery Electric Vehicles Based on Regression and Time-Series Methods: A Case Study of Beijing

    Directory of Open Access Journals (Sweden)

    Jun Bi

    2018-04-01

    Full Text Available Battery electric vehicles (BEVs reduce energy consumption and air pollution as compared with conventional vehicles. However, the limited driving range and potential long charging time of BEVs create new problems. Accurate charging time prediction of BEVs helps drivers determine travel plans and alleviate their range anxiety during trips. This study proposed a combined model for charging time prediction based on regression and time-series methods according to the actual data from BEVs operating in Beijing, China. After data analysis, a regression model was established by considering the charged amount for charging time prediction. Furthermore, a time-series method was adopted to calibrate the regression model, which significantly improved the fitting accuracy of the model. The parameters of the model were determined by using the actual data. Verification results confirmed the accuracy of the model and showed that the model errors were small. The proposed model can accurately depict the charging time characteristics of BEVs in Beijing.

  19. Evaluation of platelet thromboxane radioimmunoassay method to measure platelet life-span: Comparison with /sup 111/indium-platelet method

    International Nuclear Information System (INIS)

    Vallabhajosula, S.; Machac, J.; Badimon, L.; Lipszyc, H.; Goldsmith, S.J.; Fuster, V.

    1985-01-01

    The platelet activation during radiolabeling in vitro with Cr-51 and In-111 may affect the platelet life-span (PLS) in vivo. A new RIA method to measure PLS is being evaluated. Aspirin inhibits platelet thromboxane (TxA/sub 2/) by acetylating cyclooxygenase. The time required for the TxA/sub 2/ levels to return towards control values depends on the rate of new platelets entering circulation and is a measure of PLS. A single dose of aspirin (150mg) was given to 5 normal human subjects. Blood samples were collected for 2 days before aspirin and daily for 10 days. TxA/sub 2/ production in response to endogenous thrombin was studied by allowing 1 ml blood sample to clot at 37 0 C for 90 min. Serum TxB/sub 2/ (stable breakdown product of Tx-A/sub 2/) levels determined by RIA technique. The plot of TxB/sub 2/ levels (% control) against time showed a gradual increase. The PLS calculated by linear regression analysis assuming a 2-day lag period before cyclooxygenase recovery is 9.7 +- 2.37. In the same 5 subjects, platelets from a 50ml blood sample were labeled with /sup 111/In-tropolone in 2 ml autologous plasma. Starting at 1 hr after injection of labeled platelets, 10 blood samples were obtained over a 8 day period. The PLS calculated based on a linear regression analysis is 10.2 +. 1.4. The PLS measured from the rate of platelet disappearance from circulation and the rate of platelet regeneration into circulation are quite comparable in normal subjects. TxA/sub 2/ regeneration RIA may provide a method to measure PLS without administering radioactivity to patient

  20. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    Science.gov (United States)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  1. Regression to fuzziness method for estimation of remaining useful life in power plant components

    Science.gov (United States)

    Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.

    2014-10-01

    Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.

  2. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    Science.gov (United States)

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.

  3. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

    NARCIS (Netherlands)

    Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

    2017-01-01

    Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels

  4. Determination of benzo(apyrene content in PM10 using regression methods

    Directory of Open Access Journals (Sweden)

    Jacek Gębicki

    2015-12-01

    Full Text Available The paper presents an attempt of application of multidimensional linear regression to estimation of an empirical model describing the factors influencing on B(aP content in suspended dust PM10 in Olsztyn and Elbląg city regions between 2010 and 2013. During this period annual average concentration of B(aP in PM10 exceeded the admissible level 1.5-3 times. Conducted investigations confirm that the reasons of B(aP concentration increase are low-efficiency individual home heat stations or low-temperature heat sources, which are responsible for so-called low emission during heating period. Dependences between the following quantities were analysed: concentration of PM10 dust in air, air temperature, wind velocity, air humidity. A measure of model fitting to actual B(aP concentration in PM10 was the coefficient of determination of the model. Application of multidimensional linear regression yielded the equations characterized by high values of the coefficient of determination of the model, especially during heating season. This parameter ranged from 0.54 to 0.80 during the analyzed period.

  5. Hybrid ANN–PLS approach to scroll compressor thermodynamic performance prediction

    International Nuclear Information System (INIS)

    Tian, Z.; Gu, B.; Yang, L.; Lu, Y.

    2015-01-01

    In this paper, a scroll compressor thermodynamic performance prediction was carried out by applying a hybrid ANN–PLS model. Firstly, an experimental platform with second-refrigeration calorimeter was set up and steady-state scroll compressor data sets were collected from experiments. Then totally 148 data sets were introduced to train and verify the validity of the ANN–PLS model for predicting the scroll compressor parameters such as volumetric efficiency, refrigerant mass flow rate, discharge temperature and power consumption. The ANN–PLS model was determined with 5 hidden neurons and 7 latent variables through the training process. Ultimately, the ANN–PLS model showed better performance than the ANN model and the PLS model working separately. ANN–PLS predictions agree well with the experimental values with mean relative errors (MREs) in the range of 0.34–1.96%, correlation coefficients (R 2 ) in the range of 0.9703–0.9999 and very low root mean square errors (RMSEs). - Highlights: • Hybrid ANN–PLS is utilized to predict the thermodynamic performance of scroll compressor. • ANN–PLS model is determined with 5 hidden neurons and 7 latent variables. • ANN–PLS model demonstrates better performance than ANN and PLS working separately. • The values of MRE and RMSE are in the range of 0.34–1.96% and 0.9703–0.9999, respectively

  6. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  7. Prediction of SOC content by Vis-NIR spectroscopy at European scale using a modified local PLS algorithm

    Science.gov (United States)

    Nocita, M.; Stevens, A.; Toth, G.; van Wesemael, B.; Montanarella, L.

    2012-12-01

    In the context of global environmental change, the estimation of carbon fluxes between soils and the atmosphere has been the object of a growing number of studies. This has been motivated notably by the possibility to sequester CO2 into soils by increasing the soil organic carbon (SOC) stocks and by the role of SOC in maintaining soil quality. Spatial variability of SOC masks its slow accumulation or depletion, and the sampling density required to detect a change in SOC content is often very high and thus very expensive and labour intensive. Visible near infrared diffuse reflectance spectroscopy (Vis-NIR DRS) has been shown to be a fast, cheap and efficient tool for the prediction of SOC at fine scales. However, when applied to regional or country scales, Vis-NIR DRS did not provide sufficient accuracy as an alternative to standard laboratory soil analysis for SOC monitoring. Under the framework of Land Use/Cover Area Frame Statistical Survey (LUCAS) project of the European Commission's Joint Research Centre (JRC), about 20,000 samples were collected all over European Union. Soil samples were analyzed for several physical and chemical parameters, and scanned with a Vis-NIR spectrometer in the same laboratory. The scope of our research was to predict SOC content at European scale using LUCAS spectral library. We implemented a modified local partial least square regression (l-PLS) including, in addition to spectral distance, other potentially useful covariates (geography, texture, etc.) to select for each unknown sample a group of predicting neighbours. The dataset was split in mineral soils under cropland, mineral soils under grassland, mineral soils under woodland, and organic soils due to the extremely diverse spectral response of the four classes. Four every class training (70%) and test (30%) sets were created to calibrate and validate the SOC prediction models. The results showed very good prediction ability for mineral soils under cropland and mineral soils

  8. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.

    Science.gov (United States)

    Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W

    2005-01-01

    Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.

  9. A method to determine the necessity for global signal regression in resting-state fMRI studies.

    Science.gov (United States)

    Chen, Gang; Chen, Guangyu; Xie, Chunming; Ward, B Douglas; Li, Wenjun; Antuono, Piero; Li, Shi-Jiang

    2012-12-01

    In resting-state functional MRI studies, the global signal (operationally defined as the global average of resting-state functional MRI time courses) is often considered a nuisance effect and commonly removed in preprocessing. This global signal regression method can introduce artifacts, such as false anticorrelated resting-state networks in functional connectivity analyses. Therefore, the efficacy of this technique as a correction tool remains questionable. In this article, we establish that the accuracy of the estimated global signal is determined by the level of global noise (i.e., non-neural noise that has a global effect on the resting-state functional MRI signal). When the global noise level is low, the global signal resembles the resting-state functional MRI time courses of the largest cluster, but not those of the global noise. Using real data, we demonstrate that the global signal is strongly correlated with the default mode network components and has biological significance. These results call into question whether or not global signal regression should be applied. We introduce a method to quantify global noise levels. We show that a criteria for global signal regression can be found based on the method. By using the criteria, one can determine whether to include or exclude the global signal regression in minimizing errors in functional connectivity measures. Copyright © 2012 Wiley Periodicals, Inc.

  10. Stepwise multiple regression method of greenhouse gas emission modeling in the energy sector in Poland.

    Science.gov (United States)

    Kolasa-Wiecek, Alicja

    2015-04-01

    The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption. Copyright © 2015. Published by Elsevier B.V.

  11. Dual Regression

    OpenAIRE

    Spady, Richard; Stouli, Sami

    2012-01-01

    We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...

  12. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  13. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

    Science.gov (United States)

    Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-01

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  14. Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method

    International Nuclear Information System (INIS)

    Sun Zhong-Hua; Jiang Fan

    2010-01-01

    In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method. (rapid communication)

  15. Two-Stage Method Based on Local Polynomial Fitting for a Linear Heteroscedastic Regression Model and Its Application in Economics

    Directory of Open Access Journals (Sweden)

    Liyun Su

    2012-01-01

    Full Text Available We introduce the extension of local polynomial fitting to the linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to nonparametric technique of local polynomial estimation, we do not need to know the heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we focus on comparison of parameters and reach an optimal fitting. Besides, we verify the asymptotic normality of parameters based on numerical simulations. Finally, this approach is applied to a case of economics, and it indicates that our method is surely effective in finite-sample situations.

  16. Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

    International Nuclear Information System (INIS)

    Dyar, M.D.; Carmosino, M.L.; Breves, E.A.; Ozanne, M.V.; Clegg, S.M.; Wiens, R.C.

    2012-01-01

    A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

  17. Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

    Energy Technology Data Exchange (ETDEWEB)

    Dyar, M.D., E-mail: mdyar@mtholyoke.edu [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Carmosino, M.L.; Breves, E.A.; Ozanne, M.V. [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Clegg, S.M.; Wiens, R.C. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

    2012-04-15

    A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

  18. Quantitative Research Methods in Chaos and Complexity: From Probability to Post Hoc Regression Analyses

    Science.gov (United States)

    Gilstrap, Donald L.

    2013-01-01

    In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…

  19. Forecast daily indices of solar activity, F10.7, using support vector regression method

    International Nuclear Information System (INIS)

    Huang Cong; Liu Dandan; Wang Jingsong

    2009-01-01

    The 10.7 cm solar radio flux (F10.7), the value of the solar radio emission flux density at a wavelength of 10.7 cm, is a useful index of solar activity as a proxy for solar extreme ultraviolet radiation. It is meaningful and important to predict F10.7 values accurately for both long-term (months-years) and short-term (days) forecasting, which are often used as inputs in space weather models. This study applies a novel neural network technique, support vector regression (SVR), to forecasting daily values of F10.7. The aim of this study is to examine the feasibility of SVR in short-term F10.7 forecasting. The approach, based on SVR, reduces the dimension of feature space in the training process by using a kernel-based learning algorithm. Thus, the complexity of the calculation becomes lower and a small amount of training data will be sufficient. The time series of F10.7 from 2002 to 2006 are employed as the data sets. The performance of the approach is estimated by calculating the norm mean square error and mean absolute percentage error. It is shown that our approach can perform well by using fewer training data points than the traditional neural network. (research paper)

  20. Laser-induced Breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model

    International Nuclear Information System (INIS)

    Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong

    2015-01-01

    A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution

  1. Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination

    Directory of Open Access Journals (Sweden)

    Sara Mortaz Hejri

    2013-01-01

    Full Text Available Background: One of the methods used for standard setting is the borderline regression method (BRM. This study aims to assess the reliability of BRM when the pass-fail standard in an objective structured clinical examination (OSCE was calculated by averaging the BRM standards obtained for each station separately. Materials and Methods: In nine stations of the OSCE with direct observation the examiners gave each student a checklist score and a global score. Using a linear regression model for each station, we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The OSCE pass-fail standard was defined as the average of all station′s standard. To determine the reliability, the root mean square error (RMSE was calculated. The R2 coefficient and the inter-grade discrimination were calculated to assess the quality of OSCE. Results: The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R2 coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. Conclusion: The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.

  2. Single-electron multiplication statistics as a combination of Poissonian pulse height distributions using constraint regression methods

    International Nuclear Information System (INIS)

    Ballini, J.-P.; Cazes, P.; Turpin, P.-Y.

    1976-01-01

    Analysing the histogram of anode pulse amplitudes allows a discussion of the hypothesis that has been proposed to account for the statistical processes of secondary multiplication in a photomultiplier. In an earlier work, good agreement was obtained between experimental and reconstructed spectra, assuming a first dynode distribution including two Poisson distributions of distinct mean values. This first approximation led to a search for a method which could give the weights of several Poisson distributions of distinct mean values. Three methods have been briefly exposed: classical linear regression, constraint regression (d'Esopo's method), and regression on variables subject to error. The use of these methods gives an approach of the frequency function which represents the dispersion of the punctual mean gain around the whole first dynode mean gain value. Comparison between this function and the one employed in Polya distribution allows the statement that the latter is inadequate to describe the statistical process of secondary multiplication. Numerous spectra obtained with two kinds of photomultiplier working under different physical conditions have been analysed. Then two points are discussed: - Does the frequency function represent the dynode structure and the interdynode collection process. - Is the model (the multiplication process of all dynodes but the first one, is Poissonian) valid whatever the photomultiplier and the utilization conditions. (Auth.)

  3. Recursive least squares method of regression coefficients estimation as a special case of Kalman filter

    Science.gov (United States)

    Borodachev, S. M.

    2016-06-01

    The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.

  4. Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

    Science.gov (United States)

    Huang, Lei

    2015-01-01

    To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409

  5. Direct and regression methods do not give different estimates of digestible and metabolizable energy of wheat for pigs.

    Science.gov (United States)

    Bolarinwa, O A; Adeola, O

    2012-12-01

    Digestible and metabolizable energy contents of feed ingredients for pigs can be determined by direct or indirect methods. There are situations when only the indirect approach is suitable and the regression method is a robust indirect approach. This study was conducted to compare the direct and regression methods for determining the energy value of wheat for pigs. Twenty-four barrows with an average initial BW of 31 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g wheat/kg plus minerals and vitamins (sole wheat) for the direct method, corn (Zea mays)-soybean (Glycine max) meal reference diet (RD), RD + 300 g wheat/kg, and RD + 600 g wheat/kg. The 3 corn-soybean meal diets were used for the regression method and wheat replaced the energy-yielding ingredients, corn and soybean meal, so that the same ratio of corn and soybean meal across the experimental diets was maintained. The wheat used was analyzed to contain 883 g DM, 15.2 g N, and 3.94 Mcal GE/kg. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d total but separate collection of feces and urine. The DE and ME for the sole wheat diet were 3.83 and 3.77 Mcal/kg DM, respectively. Because the sole wheat diet contained 969 g wheat/kg, these translate to 3.95 Mcal DE/kg DM and 3.89 Mcal ME/kg DM. The RD used for the regression approach yielded 4.00 Mcal DE and 3.91 Mcal ME/kg DM diet. Increasing levels of wheat in the RD linearly reduced (P direct method (3.95 and 3.89 Mcal/kg DM) did not differ (0.78 < P < 0.89) from those obtained using the regression method (3.96 and 3.88 Mcal/kg DM).

  6. Multivariate regression methods for estimating velocity of ictal discharges from human microelectrode recordings

    Science.gov (United States)

    Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.

    2017-08-01

    Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically

  7. Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression.

    Directory of Open Access Journals (Sweden)

    Kosuke Yoshida

    Full Text Available In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS regression to resting-state functional magnetic resonance imaging (rs-fMRI data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area.

  8. Power system state estimation using an iteratively reweighted least squares method for sequential L{sub 1}-regression

    Energy Technology Data Exchange (ETDEWEB)

    Jabr, R.A. [Electrical, Computer and Communication Engineering Department, Notre Dame University, P.O. Box 72, Zouk Mikhael, Zouk Mosbeh (Lebanon)

    2006-02-15

    This paper presents an implementation of the least absolute value (LAV) power system state estimator based on obtaining a sequence of solutions to the L{sub 1}-regression problem using an iteratively reweighted least squares (IRLS{sub L1}) method. The proposed implementation avoids reformulating the regression problem into standard linear programming (LP) form and consequently does not require the use of common methods of LP, such as those based on the simplex method or interior-point methods. It is shown that the IRLS{sub L1} method is equivalent to solving a sequence of linear weighted least squares (LS) problems. Thus, its implementation presents little additional effort since the sparse LS solver is common to existing LS state estimators. Studies on the termination criteria of the IRLS{sub L1} method have been carried out to determine a procedure for which the proposed estimator is more computationally efficient than a previously proposed non-linear iteratively reweighted least squares (IRLS) estimator. Indeed, it is revealed that the proposed method is a generalization of the previously reported IRLS estimator, but is based on more rigorous theory. (author)

  9. An Improved Method for Sizing Standalone Photovoltaic Systems Using Generalized Regression Neural Network

    Directory of Open Access Journals (Sweden)

    Tamer Khatib

    2014-01-01

    Full Text Available In this research an improved approach for sizing standalone PV system (SAPV is presented. This work is an improved work developed previously by the authors. The previous work is based on the analytical method which faced some concerns regarding the difficulty of finding the model’s coefficients. Therefore, the proposed approach in this research is based on a combination of an analytical method and a machine learning approach for a generalized artificial neural network (GRNN. The GRNN assists to predict the optimal size of a PV system using the geographical coordinates of the targeted site instead of using mathematical formulas. Employing the GRNN facilitates the use of a previously developed method by the authors and avoids some of its drawbacks. The approach has been tested using data from five Malaysian sites. According to the results, the proposed method can be efficiently used for SAPV sizing whereas the proposed GRNN based model predicts the sizing curves of the PV system accurately with a prediction error of 0.6%. Moreover, hourly meteorological and load demand data are used in this research in order to consider the uncertainty of the solar energy and the load demand.

  10. Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection

    DEFF Research Database (Denmark)

    Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald

    2013-01-01

    The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PL...

  11. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    Science.gov (United States)

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  12. A novel adaptive kernel method with kernel centers determined by a support vector regression approach

    NARCIS (Netherlands)

    Sun, L.G.; De Visser, C.C.; Chu, Q.P.; Mulder, J.A.

    2012-01-01

    The optimality of the kernel number and kernel centers plays a significant role in determining the approximation power of nearly all kernel methods. However, the process of choosing optimal kernels is always formulated as a global optimization task, which is hard to accomplish. Recently, an

  13. A novel hybrid method of beta-turn identification in protein using binary logistic regression and neural network.

    Science.gov (United States)

    Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz

    2012-01-01

    From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.

  14. Improving ASTER GDEM Accuracy Using Land Use-Based Linear Regression Methods: A Case Study of Lianyungang, East China

    Directory of Open Access Journals (Sweden)

    Xiaoyan Yang

    2018-04-01

    Full Text Available The Advanced Spaceborne Thermal-Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM is important to a wide range of geographical and environmental studies. Its accuracy, to some extent associated with land-use types reflecting topography, vegetation coverage, and human activities, impacts the results and conclusions of these studies. In order to improve the accuracy of ASTER GDEM prior to its application, we investigated ASTER GDEM errors based on individual land-use types and proposed two linear regression calibration methods, one considering only land use-specific errors and the other considering the impact of both land-use and topography. Our calibration methods were tested on the coastal prefectural city of Lianyungang in eastern China. Results indicate that (1 ASTER GDEM is highly accurate for rice, wheat, grass and mining lands but less accurate for scenic, garden, wood and bare lands; (2 despite improvements in ASTER GDEM2 accuracy, multiple linear regression calibration requires more data (topography and a relatively complex calibration process; (3 simple linear regression calibration proves a practicable and simplified means to systematically investigate and improve the impact of land-use on ASTER GDEM accuracy. Our method is applicable to areas with detailed land-use data based on highly accurate field-based point-elevation measurements.

  15. MALDI-TOF-MS with PLS Modeling Enables Strain Typing of the Bacterial Plant Pathogen Xanthomonas axonopodis

    Science.gov (United States)

    Sindt, Nathan M.; Robison, Faith; Brick, Mark A.; Schwartz, Howard F.; Heuberger, Adam L.; Prenni, Jessica E.

    2018-02-01

    Matrix-assisted desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) is a fast and effective tool for microbial species identification. However, current approaches are limited to species-level identification even when genetic differences are known. Here, we present a novel workflow that applies the statistical method of partial least squares discriminant analysis (PLS-DA) to MALDI-TOF-MS protein fingerprint data of Xanthomonas axonopodis, an important bacterial plant pathogen of fruit and vegetable crops. Mass spectra of 32 X. axonopodis strains were used to create a mass spectral library and PLS-DA was employed to model the closely related strains. A robust workflow was designed to optimize the PLS-DA model by assessing the model performance over a range of signal-to-noise ratios (s/n) and mass filter (MF) thresholds. The optimized parameters were observed to be s/n = 3 and MF = 0.7. The model correctly classified 83% of spectra withheld from the model as a test set. A new decision rule was developed, termed the rolled-up Maximum Decision Rule (ruMDR), and this method improved identification rates to 92%. These results demonstrate that MALDI-TOF-MS protein fingerprints of bacterial isolates can be utilized to enable identification at the strain level. Furthermore, the open-source framework of this workflow allows for broad implementation across various instrument platforms as well as integration with alternative modeling and classification algorithms.

  16. A comparison on parameter-estimation methods in multiple regression analysis with existence of multicollinearity among independent variables

    Directory of Open Access Journals (Sweden)

    Hukharnsusatrue, A.

    2005-11-01

    Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than

  17. A computer program for uncertainty analysis integrating regression and Bayesian methods

    Science.gov (United States)

    Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary

    2014-01-01

    This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.

  18. Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression

    CSIR Research Space (South Africa)

    Gregor, Luke

    2017-12-01

    Full Text Available understanding with spatially integrated air–sea flux estimates (Fay and McKinley, 2014). Conversely, ocean biogeochemical process models are good tools for mechanis- tic understanding, but fail to represent the seasonality of CO2 fluxes in the Southern Ocean... of including coordinate variables as proxies of 1pCO2 in the empirical methods. In the inter- comparison study by Rödenbeck et al. (2015) proxies typi- cally include, but are not limited to, sea surface temperature (SST), chlorophyll a (Chl a), mixed layer...

  19. Consistency analysis of subspace identification methods based on a linear regression approach

    DEFF Research Database (Denmark)

    Knudsen, Torben

    2001-01-01

    In the literature results can be found which claim consistency for the subspace method under certain quite weak assumptions. Unfortunately, a new result gives a counter example showing inconsistency under these assumptions and then gives new more strict sufficient assumptions which however does n...... not include important model structures as e.g. Box-Jenkins. Based on a simple least squares approach this paper shows the possible inconsistency under the weak assumptions and develops only slightly stricter assumptions sufficient for consistency and which includes any model structure...

  20. Adaptive methods for flood forecasting using linear regression models in the upper basin of Senegal River

    International Nuclear Information System (INIS)

    Sambou, Soussou

    2004-01-01

    In flood forecasting modelling, large basins are often considered as hydrological systems with multiple inputs and one output. Inputs are hydrological variables such rainfall, runoff and physical characteristics of basin; output is runoff. Relating inputs to output can be achieved using deterministic, conceptual, or stochastic models. Rainfall runoff models generally lack of accuracy. Physical hydrological processes based models, either deterministic or conceptual are highly data requirement demanding and by the way very complex. Stochastic multiple input-output models, using only historical chronicles of hydrological variables particularly runoff are by the way very popular among the hydrologists for large river basin flood forecasting. Application is made on the Senegal River upstream of Bakel, where the River is formed by the main branch, Bafing, and two tributaries, Bakoye and Faleme; Bafing being regulated by Manantaly Dam. A three inputs and one output model has been used for flood forecasting on Bakel. Influence of the lead forecasting, and of the three inputs taken separately, then associated two by two, and altogether has been verified using a dimensionless variance as criterion of quality. Inadequacies occur generally between model output and observations; to put model in better compliance with current observations, we have compared four parameter updating procedure, recursive least squares, Kalman filtering, stochastic gradient method, iterative method, and an AR errors forecasting model. A combination of these model updating have been used in real time flood forecasting.(Author)

  1. High cycle fatigue test and regression methods of S-N curve

    International Nuclear Information System (INIS)

    Kim, D. W.; Park, J. Y.; Kim, W. G.; Yoon, J. H.

    2011-11-01

    The fatigue design curve in the ASME Boiler and Pressure Vessel Code Section III are based on the assumption that fatigue life is infinite after 106 cycles. This is because standard fatigue testing equipment prior to the past decades was limited in speed to less than 200 cycles per second. Traditional servo-hydraulic machines work at frequency of 50 Hz. Servo-hydraulic machines working at 1000 Hz have been developed after 1997. This machines allow high frequency and displacement of up to ±0.1 mm and dynamic load of ±20 kN are guaranteed. The frequency of resonant fatigue test machine is 50-250 Hz. Various forced vibration-based system works at 500 Hz or 1.8 kHz. Rotating bending machines allow testing frequency at 0.1-200 Hz. The main advantage of ultrasonic fatigue testing at 20 kHz is performing Although S-N curve is determined by experiment, the fatigue strength corresponding to a given fatigue life should be determined by statistical method considering the scatter of fatigue properties. In this report, the statistical methods for evaluation of fatigue test data is investigated

  2. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    Science.gov (United States)

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  3. Measuring decision weights in recognition experiments with multiple response alternatives: comparing the correlation and multinomial-logistic-regression methods.

    Science.gov (United States)

    Dai, Huanping; Micheyl, Christophe

    2012-11-01

    Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.

  4. Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

    Directory of Open Access Journals (Sweden)

    Sutikno Sutikno

    2010-08-01

    Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.

  5. Collective effects of the PLS 2 GeV storage ring

    International Nuclear Information System (INIS)

    Yoon, M.; Choi, J.; Lee, T.

    1993-01-01

    Collective effects of the PLS storage ring are discussed. Evaluation of the PLS storage ring coupling impedances is presented. RF cavity Impedances are emphasized. Single-bunch threshold current is studied and longitudinal coupled-bunch instabilities caused by RF narrow-band resonances are analyzed

  6. [Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

    Science.gov (United States)

    Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

    2018-04-01

    Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.

  7. The Use of Alternative Regression Methods in Social Sciences and the Comparison of Least Squares and M Estimation Methods in Terms of the Determination of Coefficient

    Science.gov (United States)

    Coskuntuncel, Orkun

    2013-01-01

    The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…

  8. Realtime control system for microprobe beamline at PLS

    Energy Technology Data Exchange (ETDEWEB)

    Yoon, J.C.; Lee, J.W.; Kim, K.H.; Ko, I.S. [Pohang Accelerator Laboratory, POSTECH, Pohang (Korea)

    1998-11-01

    The microprobe beamline of the Pohang Light Source (PLS) consists of main and second slits, a microprobe system, two ion chambers, a video-microscope, and a Si(Li) detector. These machine components must be controlled remodely through the computer system to make user's experiments precise and speedy. A real-time computer control system was developed to control and monitor these components. A VMEbus computer with an OS-9 real-time operating system was used for the low-level data acquisition and control. VME I/O modules were used for the step motor control and the scalar control. The software has a modular structure for the maximum performance and the easy maintenance. We developed the database, the I/O driver, and the control software. We used PC/Windows 95 for the data logging and the operator interface. Visual C{sup ++} was used for the graphical user interface programming. RS232C was used for the communication between the VME and the PC. (author)

  9. Design of High Field Multipole Wiggler at PLS

    International Nuclear Information System (INIS)

    Kim, D. E.; Park, K. H.; Lee, H. G.; Suh, H. S.; Han, H. S.; Jung, Y. G.; Chung, C. W.

    2007-01-01

    Pohang Accelerator Laboratory (PAL) is developing a high field multipole wiggler for new EXAFS beamline. The beamline is planning to utilize very high photon energy (∼40keV) synchrotron radiation at Pohang Light Source (PLS). To achieve higher critical photon energy, the wiggler field need to be maximized. A magnetic structure with wedged pole and blocks with additional side blocks which are similar to asymmetric wiggler of ESRF are designed to achieve higher flux density. The end structures were designed to be asymmetric along the beam direction to ensure systematic zero 1st field integral. The thickness of the last magnets were adjusted to minimize the transition sequence to the fully developed periodic field. This approach is more convenient to control than adjusting the strength of the end magnets. The final design features 140mm period, 2.5 Tesla peak flux density at 12mm pole gap, 1205mm magnetic structure length with 16 full field poles. In this article, all the design, engineering efforts for the HFMSII wiggler will be described

  10. Data analysis of photon beam position at PLS-II

    Energy Technology Data Exchange (ETDEWEB)

    Ko, J.; Shin, S., E-mail: tlssh@postech.ac.kr; Huang, Jung-Yun; Kim, D.; Kim, C.; Kim, Ilyou; Lee, T.-Y.; Park, C.-D.; Kim, K. R. [Pohang Accelerator Laboratory, Pohang, Kyungbuk 790-834 (Korea, Republic of); Cho, Moohyun [Department of Physics, POSTECH, Pohang, Kyungbuk 790-834 (Korea, Republic of)

    2016-07-27

    In the third generation light source, photon beam position stability is critical issue on user experiment. Generally photon beam position monitors have been developed for the detection of the real photon beam position and the position is controlled by feedback system in order to keep the reference photon beam position. In the PLS-II, photon beam position stability for front end of particular beam line, in which photon beam position monitor is installed, has been obtained less than rms 1μm for user service period. Nevertheless, detail analysis for photon beam position data in order to demonstrate the performance of photon beam position monitor is necessary, since it can be suffers from various unknown noises. (for instance, a back ground contamination due to upstream or downstream dipole radiation, undulator gap dependence, etc.) In this paper, we will describe the start to end study for photon beam position stability and the Singular Value Decomposition (SVD) analysis to demonstrate the reliability on photon beam position data.

  11. Comparison of exact, efron and breslow parameter approach method on hazard ratio and stratified cox regression model

    Science.gov (United States)

    Fatekurohman, Mohamat; Nurmala, Nita; Anggraeni, Dian

    2018-04-01

    Lungs are the most important organ, in the case of respiratory system. Problems related to disorder of the lungs are various, i.e. pneumonia, emphysema, tuberculosis and lung cancer. Comparing all those problems, lung cancer is the most harmful. Considering about that, the aim of this research applies survival analysis and factors affecting the endurance of the lung cancer patient using comparison of exact, Efron and Breslow parameter approach method on hazard ratio and stratified cox regression model. The data applied are based on the medical records of lung cancer patients in Jember Paru-paru hospital on 2016, east java, Indonesia. The factors affecting the endurance of the lung cancer patients can be classified into several criteria, i.e. sex, age, hemoglobin, leukocytes, erythrocytes, sedimentation rate of blood, therapy status, general condition, body weight. The result shows that exact method of stratified cox regression model is better than other. On the other hand, the endurance of the patients is affected by their age and the general conditions.

  12. Use of Geographically Weighted Regression (GWR Method to Estimate the Effects of Location Attributes on the Residential Property Values

    Directory of Open Access Journals (Sweden)

    Mohd Faris Dziauddin

    2017-07-01

    Full Text Available This study estimates the effect of locational attributes on residential property values in Kuala Lumpur, Malaysia. Geographically weighted regression (GWR enables the use of the local parameter rather than the global parameter to be estimated, with the results presented in map form. The results of this study reveal that residential property values are mainly determined by the property’s physical (structural attributes, but proximity to locational attributes also contributes marginally. The use of GWR in this study is considered a better approach than other methods to examine the effect of locational attributes on residential property values. GWR has the capability to produce meaningful results in which different locational attributes have differential spatial effects across a geographical area on residential property values. This method has the ability to determine the factors on which premiums depend, and in turn it can assist the government in taxation matters.

  13. Regression Phalanxes

    OpenAIRE

    Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

    2017-01-01

    Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...

  14. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    Science.gov (United States)

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  15. Computational Study of Estrogen Receptor-Alpha Antagonist with Three-Dimensional Quantitative Structure-Activity Relationship, Support Vector Regression, and Linear Regression Methods

    Directory of Open Access Journals (Sweden)

    Ying-Hsin Chang

    2013-01-01

    Full Text Available Human estrogen receptor (ER isoforms, ERα and ERβ, have long been an important focus in the field of biology. To better understand the structural features associated with the binding of ERα ligands to ERα and modulate their function, several QSAR models, including CoMFA, CoMSIA, SVR, and LR methods, have been employed to predict the inhibitory activity of 68 raloxifene derivatives. In the SVR and LR modeling, 11 descriptors were selected through feature ranking and sequential feature addition/deletion to generate equations to predict the inhibitory activity toward ERα. Among four descriptors that constantly appear in various generated equations, two agree with CoMFA and CoMSIA steric fields and another two can be correlated to a calculated electrostatic potential of ERα.

  16. New PLS analysis approach to wine volatile compounds characterization by near infrared spectroscopy (NIR).

    Science.gov (United States)

    Genisheva, Z; Quintelas, C; Mesquita, D P; Ferreira, E C; Oliveira, J M; Amaral, A L

    2018-04-25

    This work aims to explore the potential of near infrared (NIR) spectroscopy to quantify volatile compounds in Vinho Verde wines, commonly determined by gas chromatography. For this purpose, 105 Vinho Verde wine samples were analyzed using Fourier transform near infrared (FT-NIR) transmission spectroscopy in the range of 5435 cm -1 to 6357 cm -1 . Boxplot and principal components analysis (PCA) were performed for clusters identification and outliers removal. A partial least square (PLS) regression was then applied to develop the calibration models, by a new iterative approach. The predictive ability of the models was confirmed by an external validation procedure with an independent sample set. The obtained results could be considered as quite good with coefficients of determination (R 2 ) varying from 0.94 to 0.97. The current methodology, using NIR spectroscopy and chemometrics, can be seen as a promising rapid tool to determine volatile compounds in Vinho Verde wines. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. The Effect of Nonnormality on CB-SEM and PLS-SEM Path Estimates

    OpenAIRE

    Z. Jannoo; B. W. Yap; N. Auchoybur; M. A. Lazim

    2014-01-01

    The two common approaches to Structural Equation Modeling (SEM) are the Covariance-Based SEM (CB-SEM) and Partial Least Squares SEM (PLS-SEM). There is much debate on the performance of CB-SEM and PLS-SEM for small sample size and when distributions are nonnormal. This study evaluates the performance of CB-SEM and PLS-SEM under normality and nonnormality conditions via a simulation. Monte Carlo Simulation in R programming language was employed to generate data based on the theoretical model w...

  18. Use of different marker pre-selection methods based on single SNP regression in the estimation of Genomic-EBVs

    Directory of Open Access Journals (Sweden)

    Corrado Dimauro

    2010-01-01

    Full Text Available Two methods of SNPs pre-selection based on single marker regression for the estimation of genomic breeding values (G-EBVs were compared using simulated data provided by the XII QTL-MAS workshop: i Bonferroni correction of the significance threshold and ii Permutation test to obtain the reference distribution of the null hypothesis and identify significant markers at P<0.01 and P<0.001 significance thresholds. From the set of markers significant at P<0.001, random subsets of 50% and 25% markers were extracted, to evaluate the effect of further reducing the number of significant SNPs on G-EBV predictions. The Bonferroni correction method allowed the identification of 595 significant SNPs that gave the best G-EBV accuracies in prediction generations (82.80%. The permutation methods gave slightly lower G-EBV accuracies even if a larger number of SNPs resulted significant (2,053 and 1,352 for 0.01 and 0.001 significance thresholds, respectively. Interestingly, halving or dividing by four the number of SNPs significant at P<0.001 resulted in an only slightly decrease of G-EBV accuracies. The genetic structure of the simulated population with few QTL carrying large effects, might have favoured the Bonferroni method.

  19. QSAR Modeling of COX -2 Inhibitory Activity of Some Dihydropyridine and Hydroquinoline Derivatives Using Multiple Linear Regression (MLR) Method.

    Science.gov (United States)

    Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra

    2017-01-01

    COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.

  20. Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.

    Science.gov (United States)

    Pralle, R S; Weigel, K W; White, H M

    2018-05-01

    Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed

  1. Discrimination of Transgenic Rice Based on Near Infrared Reflectance Spectroscopy and Partial Least Squares Regression Discriminant Analysis

    Directory of Open Access Journals (Sweden)

    ZHANG Long

    2015-09-01

    Full Text Available Near infrared reflectance spectroscopy (NIRS, a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA to discriminate the transgenic (TCTP and mi166 and wild type (Zhonghua 11 rice. Furthermore, rice lines transformed with protein gene (OsTCTP and regulation gene (Osmi166 were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.

  2. Regression toward the mean – a detection method for unknown population mean based on Mee and Chua's algorithm

    Directory of Open Access Journals (Sweden)

    Lüdtke Rainer

    2008-08-01

    Full Text Available Abstract Background Regression to the mean (RTM occurs in situations of repeated measurements when extreme values are followed by measurements in the same subjects that are closer to the mean of the basic population. In uncontrolled studies such changes are likely to be interpreted as a real treatment effect. Methods Several statistical approaches have been developed to analyse such situations, including the algorithm of Mee and Chua which assumes a known population mean μ. We extend this approach to a situation where μ is unknown and suggest to vary it systematically over a range of reasonable values. Using differential calculus we provide formulas to estimate the range of μ where treatment effects are likely to occur when RTM is present. Results We successfully applied our method to three real world examples denoting situations when (a no treatment effect can be confirmed regardless which μ is true, (b when a treatment effect must be assumed independent from the true μ and (c in the appraisal of results of uncontrolled studies. Conclusion Our method can be used to separate the wheat from the chaff in situations, when one has to interpret the results of uncontrolled studies. In meta-analysis, health-technology reports or systematic reviews this approach may be helpful to clarify the evidence given from uncontrolled observational studies.

  3. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  4. VG2 URA PLS DERIVED SUMMARY ION FIT 48SEC V1.0

    Data.gov (United States)

    National Aeronautics and Space Administration — This data set contains the total ion density obtained from Voyager 2 PLS data (voltage range 10-5950 eV/Q) at Uranus by fitting the measured spectra with isotropic...

  5. VG2 URA PLS DERIVED RDR ION FIT 48SEC V1.0

    Data.gov (United States)

    National Aeronautics and Space Administration — This data set contains the ion densities and temperatures along with formal 1 Sigma errors obtained from Voyager 2 PLS data (voltage range 10-5950 eV/Q) at Uranus by...

  6. The Plasmin-Sensitive Protein Pls in Methicillin-Resistant Staphylococcus aureus (MRSA Is a Glycoprotein.

    Directory of Open Access Journals (Sweden)

    Isabelle Bleiziffer

    2017-01-01

    Full Text Available Most bacterial glycoproteins identified to date are virulence factors of pathogenic bacteria, i.e. adhesins and invasins. However, the impact of protein glycosylation on the major human pathogen Staphylococcus aureus remains incompletely understood. To study protein glycosylation in staphylococci, we analyzed lysostaphin lysates of methicillin-resistant Staphylococcus aureus (MRSA strains by SDS-PAGE and subsequent periodic acid-Schiff's staining. We detected four (>300, ∼250, ∼165, and ∼120 kDa and two (>300 and ∼175 kDa glycosylated surface proteins with strain COL and strain 1061, respectively. The ∼250, ∼165, and ∼175 kDa proteins were identified as plasmin-sensitive protein (Pls by mass spectrometry. Previously, Pls has been demonstrated to be a virulence factor in a mouse septic arthritis model. The pls gene is encoded by the staphylococcal cassette chromosome (SCCmec type I in MRSA that also encodes the methicillin resistance-conferring mecA and further genes. In a search for glycosyltransferases, we identified two open reading frames encoded downstream of pls on the SCCmec element, which we termed gtfC and gtfD. Expression and deletion analysis revealed that both gtfC and gtfD mediate glycosylation of Pls. Additionally, the recently reported glycosyltransferases SdgA and SdgB are involved in Pls glycosylation. Glycosylation occurs at serine residues in the Pls SD-repeat region and modifying carbohydrates are N-acetylhexosaminyl residues. Functional characterization revealed that Pls can confer increased biofilm formation, which seems to involve two distinct mechanisms. The first mechanism depends on glycosylation of the SD-repeat region by GtfC/GtfD and probably also involves eDNA, while the second seems to be independent of glycosylation as well as eDNA and may involve the centrally located G5 domains. Other previously known Pls properties are not related to the sugar modifications. In conclusion, Pls is a glycoprotein and

  7. Construction of Network Management Information System of Agricultural Products Supply Chain Based on 3PLs

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    The necessity to construct the network management information system of 3PLs agricultural supply chain is analyzed,showing that 3PLs can improve the overall competitive advantage of agricultural supply chain.3PLs changes the homogeneity management into specialized management of logistics service and achieves the alliance of the subjects at different nodes of agricultural products supply chain.Network management information system structure of agricultural products supply chain based on 3PLs is constructed,including the four layers (the network communication layer,the hardware and software environment layer,the database layer,and the application layer) and 7 function modules (centralized control,transportation process management,material and vehicle scheduling,customer relationship,storage management,customer inquiry,and financial management).Framework for the network management information system of agricultural products supply chain based on 3PLs is put forward.The management of 3PLs mainly includes purchasing management,supplier relationship management,planning management,customer relationship management,storage management and distribution management.Thus,a management system of internal and external integrated agricultural enterprises is obtained.The network management information system of agricultural products supply chain based on 3PLs has realized the effective sharing of enterprise information of agricultural products supply chain at different nodes,establishing a long-term partnership revolving around the 3PLs core enterprise,as well as a supply chain with stable relationship based on the supply chain network system,so as to improve the circulation efficiency of agricultural products,and to explore the sales market for agricultural products.

  8. Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

    Directory of Open Access Journals (Sweden)

    Chi-Cheng Huang

    2013-01-01

    Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

  9. Involvement of PlsX and the acyl-phosphate dependent sn-glycerol-3-phosphate acyltransferase PlsY in the initial stage of glycerolipid synthesis in Bacillus subtilis.

    Science.gov (United States)

    Hara, Yoshinori; Seki, Masahide; Matsuoka, Satoshi; Hara, Hiroshi; Yamashita, Atsushi; Matsumoto, Kouji

    2008-12-01

    The gene responsible for the first acylation of sn-glycerol-3-phosphate (G3P) in Bacillus subtilis has not yet been determined with certainty. The product of this first acylation, lysophosphatidic acid (LPA), is subsequently acylated again to form phosphatidic acid (PA), the primary precursor to membrane glycerolipids. A novel G3P acyltransferase (GPAT), the gene product of plsY, which uses acyl-phosphate formed by the plsX gene product, has recently been found to synthesize LPA in Streptococcus pneumoniae. We found that in B. subtilis growth arrests after repression of either a plsY homologue or a plsX homologue were overcome by expression of E. coli plsB, which encodes an acyl-acylcarrier protein (acyl-ACP)-dependent GPAT, although in the case of plsX repression a high level of plsB expression was required. B. subtilis has, therefore, a capability to use the acyl-ACP dependent GPAT of PlsB. Simultaneous expression of plsY and plsX suppressed the glycerol requirement of a strict glycerol auxotrophic derivative of the E. coli plsB26 mutant, although either one alone did not. Membrane fractions from B. subtilis cells catalyzed palmitoylphosphate-dependent acylation of [14C]-labeled G3P to synthesize [14C]-labeled LPA, whereas those from DeltaplsY cells did not. The results indicate unequivocally that PlsY is an acyl-phosphate dependent GPAT. Expression of plsX corrected the glycerol auxotrophy of a DeltaygiH (the deleted allele of an E. coli homologue of plsY) derivative of BB26-36 (plsB26 plsX50), suggesting an essential role of plsX other than substrate supply for acyl-phosphate dependent LPA synthesis. Two-hybrid examinations suggested that PlsY is associated with PlsX and that each may exist in multimeric form.

  10. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  11. Projected regression method for solving Fredholm integral equations arising in the analytic continuation problem of quantum physics

    International Nuclear Information System (INIS)

    Arsenault, Louis-François; Millis, Andrew J; Neuberg, Richard; Hannah, Lauren A

    2017-01-01

    We present a supervised machine learning approach to the inversion of Fredholm integrals of the first kind as they arise, for example, in the analytic continuation problem of quantum many-body physics. The approach provides a natural regularization for the ill-conditioned inverse of the Fredholm kernel, as well as an efficient and stable treatment of constraints. The key observation is that the stability of the forward problem permits the construction of a large database of outputs for physically meaningful inputs. Applying machine learning to this database generates a regression function of controlled complexity, which returns approximate solutions for previously unseen inputs; the approximate solutions are then projected onto the subspace of functions satisfying relevant constraints. Under standard error metrics the method performs as well or better than the Maximum Entropy method for low input noise and is substantially more robust to increased input noise. We suggest that the methodology will be similarly effective for other problems involving a formally ill-conditioned inversion of an integral operator, provided that the forward problem can be efficiently solved. (paper)

  12. A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis

    Science.gov (United States)

    2013-01-01

    Background Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Results Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. Conclusion There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic

  13. A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis.

    Science.gov (United States)

    Nica, Dragos V; Bordean, Despina Maria; Pet, Ioan; Pet, Elena; Alda, Simion; Gergen, Iosif

    2013-08-30

    Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic contamination in terrestrial ecosystems

  14. The review of the achieved degree of sustainable development in South Eastern Europe - The use of linear regression method

    Energy Technology Data Exchange (ETDEWEB)

    Golusin, Mirjana [Educons University, Vojvode Putnika st. bb, 21013 Sremska Kamnica (RS); Ivanovic, Olja Munitlak [Faculty of Business in Services, Vojvode Putnik st. bb, 21013 Sremska Kamenica (RS); Teodorovic, Natasa [Faculty of Entrepreneurial Management, Modene st. 5, 21000 Novi Sad (RS)

    2011-01-15

    The need for preservation and adequate management of the quality of environment requires the development of new methods and techniques by which the achieved degree of sustainable development can be defined as well as the laws regarding the relationship among its subsystems. Main objective of research is to point to a strong contradiction between the development of ecological and economic subsystems. In order to improve previous research, this study suggests the use of linear evaluation, by which it is possible to determine the exact degree of contradiction between these two subsystems and to define the regularities as well as the deviations. Authors present the essential steps that were used. Conducted by the method of linear regression this research shows a significant negative correlation between ecological and economic subsystem indicators, whereas its value R{sup 2} 0.58 proves the expected contradiction that exists between the two previously mentioned subsystems. By observing the sustainable development as a two-dimensional system that includes ecological and economic indicators, the authors suggest the methodology to modelling the relationship between economic and ecological development as an orthogonal distance between the degree of the current state measured by the relation between economic and ecological indicators of sustainable development and the degree which was obtained in a traditional way. The method used in this research proved to be extremely suitable for modelling the relationship between ecological and economic subsystems of sustainable development. This research was conducted on a repeated sample of countries of South East Europe by including the data for France and Germany, being two countries on the highest level of development in the European Union. (author)

  15. PLS models for determination of SARA analysis of Colombian vacuum residues and molecular distillation fractions using MIR-ATR

    Directory of Open Access Journals (Sweden)

    Jorge A. Orrego-Ruiz

    2014-06-01

    Full Text Available In this work, prediction models of Saturates, Aromatics, Resins and Asphaltenes fractions (SARA from thirty-seven vacuum residues of representative Colombian crudes and eighteen fractions of molecular distillation process were obtained. Mid-Infrared (MIR Attenuated Total Reflection (ATR spectroscopy in combination with partial least squares (PLS regression analysis was used to estimate accurately SARA analysis in these kind of samples. Calibration coefficients of prediction models were for saturates, aromatics, resins and asphaltenes fractions, 0.99, 0.96, 0.97 and 0.99, respectively. This methodology permits to control the molecular distillation process since small differences in chemical composition can be detected. Total time elapsed to give the SARA analysis per sample is 10 minutes.

  16. Development of nondestructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan [Dept. of Biosystems Machinery Engineering, Chungnam National University, Daejeon (Korea, Republic of); Kim, Moon Sung [United States Department of Agriculture Agricultural Research Service, Washington (United States); Lee, Soo Hee [Life and Technology Co.,Ltd., Hwasung (Korea, Republic of)

    2014-08-15

    This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R{sup 2}{sub c} and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

  17. Large scale air pollution estimation method combining land use regression and chemical transport modeling in a geostatistical framework.

    Science.gov (United States)

    Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey

    2014-04-15

    In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.

  18. Development of nondestructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression

    International Nuclear Information System (INIS)

    Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan; Kim, Moon Sung; Lee, Soo Hee

    2014-01-01

    This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R 2 c and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

  19. Autistic Regression

    Science.gov (United States)

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  20. Assessing the impacts of human activities and climate variations on grassland productivity by partial least squares structural equation modeling (PLS-SEM)

    Institute of Scientific and Technical Information of China (English)

    SHA Zongyao; XIE Yichun; TAN Xicheng; BAI Yongfei; LI Jonathan; LIU Xuefeng

    2017-01-01

    The cause-effect associations between geographical phenomena are an important focus in ecological research.Recent studies in structural equation modeling (SEM) demonstrated the potential for analyzing such associations.We applied the variance-based partial least squares SEM (PLS-SEM) and geographically-weighted regression (GWR) modeling to assess the human-climate impact on grassland productivity represented by above-ground biomass (AGB).The human and climate factors and their interaction were taken to explain the AGB variance by a PLS-SEM developed for the grassland ecosystem in Inner Mongolia,China.Results indicated that 65.5% of the AGB variance could be explained by the human and climate factors and their interaction.The case study showed that the human and climate factors imposed a significant and negative impact on the AGB and that their interaction alleviated to some extent the threat from the intensified human-climate pressure.The alleviation may be attributable to vegetation adaptation to high human-climate stresses,to human adaptation to climate conditions or/and to recent vegetation restoration programs in the highly degraded areas.Furthermore,the AGB response to the human and climate factors modeled by GWR exhibited significant spatial variations.This study demonstrated that the combination of PLS-SEM and GWR model is feasible to investigate the cause-effect relation in socio-ecological systems.

  1. Improved intact soil-core carbon determination applying regression shrinkage and variable selection techniques to complete spectrum laser-induced breakdown spectroscopy (LIBS).

    Science.gov (United States)

    Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M

    2013-10-01

    Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.

  2. Simultaneous determination of penicillin G salts by infrared spectroscopy: Evaluation of combining orthogonal signal correction with radial basis function-partial least squares regression

    Science.gov (United States)

    Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem

    2010-09-01

    In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.

  3. PLS-NIR determination of five parameters in different types of Chinese rice wine

    Science.gov (United States)

    Yu, Haiyan; Ying, Yibin; Fu, Xiaping; Lu, Huishan

    2005-11-01

    To evaluate the applicability of near infrared spectroscopy for determination of the five enological parameters (alcoholic degree, pH value, total acid and amino acid nitrogen, °Brix) of Chinese rice wine, transmission spectra were collected in the spectral range from 12500 to 3800 cm-1 in a 1 mm path length rectangular quartz cuvette with air as reference at room temperature. Five calibration equations for the five parameters were established between the reference data and spectra by partial least squares (PLS) regression, separately. The best calibration results were achieved for the determination of alcoholic degree and °Brix. The RPD (ration of the standard deviation of the samples to the SECV) values of the calibration for both alcoholic degree and °Brix were higher than 3 (4.30 and 7.94, respectively), which demonstrated the robustness and power of the calibration models. The determination coefficients (R2) for alcoholic degree and °Brix were 0.987 and 0.991, respectively. The performance of pH, total acid and amino acid nitrogen was not as good as that of alcoholic degree and °Brix. The RPD values for the three parameters were 1.48, 1.85 and 1.82, respectively, and R2 values were 0.964, 0.970 and 0.971, respectively. In validation step, R2 value of the five parameters are all higher than 0.7, especially for alcoholic degree and °Brix (0.968 and 0.956, respectively). The results demonstrated that NIR spectroscopy could be used to predict the concentration of the five enological parameters in Chinese rice wine.

  4. Fibre Morphological Characteristics of Kraft Pulps of Acacia melanoxylon Estimated by NIR-PLS-R Models

    Directory of Open Access Journals (Sweden)

    Helena Pereira

    2015-12-01

    Full Text Available In this paper, the morphological properties of fiber length (weighted in length and of fiber width of unbleached Kraft pulp of Acacia melanoxylon were determined using TECHPAP Morfi® equipment (Techpap SAS, Grenoble, France, and were used in the calibration development of Near Infrared (NIR partial least squares regression (PLS-R models based on the spectral data obtained for the wood. It is the first time that fiber length and width of pulp were predicted with NIR spectral data of the initial woodmeal, with high accuracy and precision, and with ratios of performance to deviation (RPD fulfilling the requirements for screening in breeding programs. The selected models for fiber length and fiber width used the second derivative and first derivative + multiplicative scatter correction (2ndDer and 1stDer + MSC pre-processed spectra, respectively, in the wavenumber ranges from 7506 to 5440 cm−1. The statistical parameters of cross-validation (RMSECV (root mean square error of cross-validation of 0.009 mm and 0.39 μm and validation (RMSEP (root mean square error of prediction of 0.007 mm and 0.36 μm with RPDTS (ratios of performance to deviation of test set values of 3.9 and 3.3, respectively, confirmed that the models are robust and well qualified for prediction. This modeling approach shows a high potential to be used for tree breeding and improvement programs, providing a rapid screening for desired fiber morphological properties of pulp prediction.

  5. H0 from cosmic chronometers and Type Ia supernovae, with Gaussian Processes and the novel Weighted Polynomial Regression method

    Science.gov (United States)

    Gómez-Valent, Adrià; Amendola, Luca

    2018-04-01

    In this paper we present new constraints on the Hubble parameter H0 using: (i) the available data on H(z) obtained from cosmic chronometers (CCH); (ii) the Hubble rate data points extracted from the supernovae of Type Ia (SnIa) of the Pantheon compilation and the Hubble Space Telescope (HST) CANDELS and CLASH Multy-Cycle Treasury (MCT) programs; and (iii) the local HST measurement of H0 provided by Riess et al. (2018), H0HST=(73.45±1.66) km/s/Mpc. Various determinations of H0 using the Gaussian processes (GPs) method and the most updated list of CCH data have been recently provided by Yu, Ratra & Wang (2018). Using the Gaussian kernel they find H0=(67.42± 4.75) km/s/Mpc. Here we extend their analysis to also include the most released and complete set of SnIa data, which allows us to reduce the uncertainty by a factor ~ 3 with respect to the result found by only considering the CCH information. We obtain H0=(67.06± 1.68) km/s/Mpc, which favors again the lower range of values for H0 and is in tension with H0HST. The tension reaches the 2.71σ level. We round off the GPs determination too by taking also into account the error propagation of the kernel hyperparameters when the CCH with and without H0HST are used in the analysis. In addition, we present a novel method to reconstruct functions from data, which consists in a weighted sum of polynomial regressions (WPR). We apply it from a cosmographic perspective to reconstruct H(z) and estimate H0 from CCH and SnIa measurements. The result obtained with this method, H0=(68.90± 1.96) km/s/Mpc, is fully compatible with the GPs ones. Finally, a more conservative GPs+WPR value is also provided, H0=(68.45± 2.00) km/s/Mpc, which is still almost 2σ away from H0HST.

  6. Energy value of poultry byproduct meal and animal-vegetable oil blend for broiler chickens by the regression method.

    Science.gov (United States)

    Cao, M H; Adeola, O

    2016-02-01

    The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P blend to the RD linearly increased (P blend as follows: IDE = 10,616x + 7.350, r(2) = 0.96; ME = 10,121x + 0.447, r(2) = 0.99; MEn = 10,124x + 2.425, r(2) = 0.99. These data indicate the respective IDE, ME, MEn values (kcal/kg of DM) of PBM evaluated to be 3,537, 3,805, and 3,278, and A-V blend evaluated to be 10,616, 10,121, and 10,124. © 2015 Poultry Science Association Inc.

  7. Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation.

    Science.gov (United States)

    Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J

    2013-12-15

    The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.

  8. Metode PLS: Analisis Kinerja Karyawan melalui Kepuasan Kerja dan Komitmen Karyawan

    Directory of Open Access Journals (Sweden)

    Saskia Yuanita

    2012-11-01

    Full Text Available Global challenges of the current causes increasing competition among national and international businesses. Under these conditions, the company realizes the importance of quality and efforts to enhance competitiveness by doing improvements consistently and continuously in order to meet customer and market needs. This study aims to examine the effect of the implementation of ISO 9001 quality management system onemployee performance, and the moderating effects of job satisfaction and employee commitment to the relationship between the application of ISO 9001:2008 quality management system on employee performance. The method used to analyze is Partial Least Square (PLS. The results show that the application of ISO 9001:2008 quality management system affects performance of employees with employee satisfaction and commitment as moderating variable that affect the relationship between the application of ISO 9001:2008quality management system on employee performance. Both variables, moderating employee satisfaction and commitment have positive parameter estimation, so that when the satisfaction and commitment of employees increase, it will give effect to the improvement of the implementation of the ISO 9001:2008 quality management system on employee performance.

  9. A study on modeling nitrogen dioxide concentrations using land-use regression and conventionally used exposure assessment methods

    Science.gov (United States)

    Choi, Giehae; Bell, Michelle L.; Lee, Jong-Tae

    2017-04-01

    The land-use regression (LUR) approach to estimate the levels of ambient air pollutants is becoming popular due to its high validity in predicting small-area variations. However, only a few studies have been conducted in Asian countries, and much less research has been conducted on comparing the performances and applied estimates of different exposure assessments including LUR. The main objectives of the current study were to conduct nitrogen dioxide (NO2) exposure assessment with four methods including LUR in the Republic of Korea, to compare the model performances, and to estimate the empirical NO2 exposures of a cohort. The study population was defined as the year 2010 participants of a government-supported cohort established for bio-monitoring in Ulsan, Republic of Korea. The annual ambient NO2 exposures of the 969 study participants were estimated with LUR, nearest station, inverse distance weighting, and ordinary kriging. Modeling was based on the annual NO2 average, traffic-related data, land-use data, and altitude of the 13 regularly monitored stations. The final LUR model indicated that area of transportation, distance to residential area, and area of wetland were important predictors of NO2. The LUR model explained 85.8% of the variation observed in the 13 monitoring stations of the year 2009. The LUR model outperformed the others based on leave-one out cross-validation comparing the correlations and root-mean square error. All NO2 estimates ranged from 11.3-18.0 ppb, with that of LUR having the widest range. The NO2 exposure levels of the residents differed by demographics. However, the average was below the national annual guidelines of the Republic of Korea (30 ppb). The LUR models showed high performances in an industrial city in the Republic of Korea, despite the small sample size and limited data. Our findings suggest that the LUR method may be useful in similar settings in Asian countries where the target region is small and availability of data is

  10. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Anderson, Ryan B.; Bell, James F.; Wiens, Roger C.; Morris, Richard V.; Clegg, Samuel M.

    2012-01-01

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO 2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically

  11. Near infrared spectrometric technique for testing fruit quality: optimisation of regression models using genetic algorithms

    Science.gov (United States)

    Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.

    2016-02-01

    Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.

  12. Regression and direct methods do not give different estimates of digestible and metabolizable energy values of barley, sorghum, and wheat for pigs.

    Science.gov (United States)

    Bolarinwa, O A; Adeola, O

    2016-02-01

    Direct or indirect methods can be used to determine the DE and ME of feed ingredients for pigs. In situations when only the indirect approach is suitable, the regression method presents a robust indirect approach. Three experiments were conducted to compare the direct and regression methods for determining the DE and ME values of barley, sorghum, and wheat for pigs. In each experiment, 24 barrows with an average initial BW of 31, 32, and 33 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g barley, sorghum, or wheat/kg plus minerals and vitamins for the direct method; a corn-soybean meal reference diet (RD); the RD + 300 g barley, sorghum, or wheat/kg; and the RD + 600 g barley, sorghum, or wheat/kg. The 3 corn-soybean meal diets were used for the regression method. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d period of total but separate collection of feces and urine in each experiment. Graded substitution of barley or wheat, but not sorghum, into the RD linearly reduced ( direct method-derived DE and ME for barley were 3,669 and 3,593 kcal/kg DM, respectively. The regressions of barley contribution to DE and ME in kilocalories against the quantity of barley DMI in kilograms generated 3,746 kcal DE/kg DM and 3,647 kcal ME/kg DM. The DE and ME for sorghum by the direct method were 4,097 and 4,042 kcal/kg DM, respectively; the corresponding regression-derived estimates were 4,145 and 4,066 kcal/kg DM. Using the direct method, energy values for wheat were 3,953 kcal DE/kg DM and 3,889 kcal ME/kg DM. The regressions of wheat contribution to DE and ME in kilocalories against the quantity of wheat DMI in kilograms generated 3,960 kcal DE/kg DM and 3,874 kcal ME/kg DM. The DE and ME of barley using the direct method were not different (0.3 direct method-derived DE and ME of sorghum were not different (0.5 direct method- and regression method-derived DE (3,953 and 3

  13. Remote-sensing data processing with the multivariate regression analysis method for iron mineral resource potential mapping: a case study in the Sarvian area, central Iran

    Science.gov (United States)

    Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran

    2018-03-01

    This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).

  14. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    Science.gov (United States)

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-02-01

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  15. Simultaneous measurement of two enzyme activities using infrared spectroscopy: A comparative evaluation of PARAFAC, TUCKER and N-PLS modeling

    DEFF Research Database (Denmark)

    Baum, Andreas; Hansen, Per Waaben; Meyer, Anne S.

    2013-01-01

    multiway methods, namely PARAFAC, TUCKER3 and N-PLS, to establish simultaneous enzyme activity assays for pectin lyase and pectin methyl esterase. Correlation coefficients Rpred2 for prediction test sets are 0.48, 0.96 and 0.96 for pectin lyase and 0.70, 0.89 and 0.89 for pectin methyl esterase......Enzymes are used in many processes to release fermentable sugars for green production of biofuel, or the refinery of biomass for extraction of functional food ingredients such as pectin or prebiotic oligosaccharides. The complex biomasses may, however, require a multitude of specific enzymes which...... are active on specific substrates generating a multitude of products. In this paper we use the plant polymer, pectin, to present a method to quantify enzyme activity of two pectolytic enzymes by monitoring their superimposed spectral evolutions simultaneously. The data is analyzed by three chemometric...

  16. Spatial Estimation of Losses Attributable to Meteorological Disasters in a Specific Area (105.0°E–115.0°E, 25°N–35°N Using Bayesian Maximum Entropy and Partial Least Squares Regression

    Directory of Open Access Journals (Sweden)

    F. S. Zhang

    2016-01-01

    Full Text Available The spatial mapping of losses attributable to such disasters is now well established as a means of describing the spatial patterns of disaster risk, and it has been shown to be suitable for many types of major meteorological disasters. However, few studies have been carried out by developing a regression model to estimate the effects of the spatial distribution of meteorological factors on losses associated with meteorological disasters. In this study, the proposed approach is capable of the following: (a estimating the spatial distributions of seven meteorological factors using Bayesian maximum entropy, (b identifying the four mapping methods used in this research with the best performance based on the cross validation, and (c establishing a fitted model between the PLS components and disaster losses information using partial least squares regression within a specific research area. The results showed the following: (a best mapping results were produced by multivariate Bayesian maximum entropy with probabilistic soft data; (b the regression model using three PLS components, extracted from seven meteorological factors by PLS method, was the most predictive by means of PRESS/SS test; (c northern Hunan Province sustains the most damage, and southeastern Gansu Province and western Guizhou Province sustained the least.

  17. Evaluation of O2PLS in Omics data integration

    NARCIS (Netherlands)

    el Bouhaddani, S.; Houwing-Duistermaat, Jeanine; Salo, Perttu; Perola, Markus; Jongbloed, G.; Uh, Hae-Won

    2016-01-01

    Background
    Rapid computational and technological developments made large amounts of omics data available in different biological levels. It is becoming clear that simultaneous data analysis methods are needed for better interpretation and understanding of the underlying systems biology.

  18. Integrated Multiscale Latent Variable Regression and Application to Distillation Columns

    Directory of Open Access Journals (Sweden)

    Muddu Madakyaru

    2013-01-01

    Full Text Available Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions, which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR techniques, such as principal component regression (PCR, partial least squares (PLS, and regularized canonical correlation analysis (RCCA. Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of these models. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR modeling algorithm that integrates modeling and feature extraction. The idea behind the IMSLVR modeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVR model that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.

  19. Understanding poisson regression.

    Science.gov (United States)

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  20. [Methodology of the description of atmospheric air pollution by nitrogen dioxide by land use regression method in Ekaterinburg].

    Science.gov (United States)

    Antropov, K M; Varaksin, A N

    2013-01-01

    This paper provides the description of Land Use Regression (LUR) modeling and the result of its application in the study of nitrogen dioxide air pollution in Ekaterinburg. The paper describes the difficulties of the modeling for air pollution caused by motor vehicles exhaust, and the ways to address these challenges. To create LUR model of the NO2 air pollution in Ekaterinburg, concentrations of NO2 were measured, data on factors affecting air pollution were collected, a statistical analysis of the data were held. A statistical model of NO2 air pollution (coefficient of determination R2 = 0.70) and a map of pollution were created.

  1. Methodological comparison of marginal structural model, time-varying Cox regression, and propensity score methods : the example of antidepressant use and the risk of hip fracture

    NARCIS (Netherlands)

    Ali, M Sanni; Groenwold, Rolf H H; Belitser, Svetlana V; Souverein, Patrick C; Martín, Elisa; Gatto, Nicolle M; Huerta, Consuelo; Gardarsdottir, Helga; Roes, Kit C B; Hoes, Arno W; de Boer, Antonius; Klungel, Olaf H

    2016-01-01

    BACKGROUND: Observational studies including time-varying treatments are prone to confounding. We compared time-varying Cox regression analysis, propensity score (PS) methods, and marginal structural models (MSMs) in a study of antidepressant [selective serotonin reuptake inhibitors (SSRIs)] use and

  2. Regression Analysis

    CERN Document Server

    Freund, Rudolf J; Sa, Ping

    2006-01-01

    The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design

  3. A thioesterase bypasses the requirement for exogenous fatty acids in the plsX deletion of Streptococcus pneumoniae

    NARCIS (Netherlands)

    Parsons, J.B.; Frank, M.W.; Eleveld, M.J.; Schalkwijk, J.; Broussard, T.C.; Jonge, M.I. de; Rock, C.O.

    2015-01-01

    PlsX is an acyl-acyl carrier protein (ACP):phosphate transacylase that interconverts the two acyl donors in Gram-positive bacterial phospholipid synthesis. The deletion of plsX in Staphylococcus aureus results in a requirement for both exogenous fatty acids and de novo type II fatty acid

  4. Simultaneous determination of estrogens (ethinylestradiol and norgestimate) concentrations in human and bovine serum albumin by use of fluorescence spectroscopy and multivariate regression analysis.

    Science.gov (United States)

    Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O

    2016-05-15

    The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation

  5. Development of a partial least squares-artificial neural network (PLS-ANN) hybrid model for the prediction of consumer liking scores of ready-to-drink green tea beverages.

    Science.gov (United States)

    Yu, Peigen; Low, Mei Yin; Zhou, Weibiao

    2018-01-01

    In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. A comparative study on generating simulated Landsat NDVI images using data fusion and regression method-the case of the Korean Peninsula.

    Science.gov (United States)

    Lee, Mi Hee; Lee, Soo Bong; Eo, Yang Dam; Kim, Sun Woong; Woo, Jung-Hun; Han, Soo Hee

    2017-07-01

    Landsat optical images have enough spatial and spectral resolution to analyze vegetation growth characteristics. But, the clouds and water vapor degrade the image quality quite often, which limits the availability of usable images for the time series vegetation vitality measurement. To overcome this shortcoming, simulated images are used as an alternative. In this study, weighted average method, spatial and temporal adaptive reflectance fusion model (STARFM) method, and multilinear regression analysis method have been tested to produce simulated Landsat normalized difference vegetation index (NDVI) images of the Korean Peninsula. The test results showed that the weighted average method produced the images most similar to the actual images, provided that the images were available within 1 month before and after the target date. The STARFM method gives good results when the input image date is close to the target date. Careful regional and seasonal consideration is required in selecting input images. During summer season, due to clouds, it is very difficult to get the images close enough to the target date. Multilinear regression analysis gives meaningful results even when the input image date is not so close to the target date. Average R 2 values for weighted average method, STARFM, and multilinear regression analysis were 0.741, 0.70, and 0.61, respectively.

  7. Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression.

    Science.gov (United States)

    Plata, Maria R; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

    2013-10-01

    A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference analyses quantify the amount of trehalose, glucose, glycogen, and mannan in S. cerevisiae. The selection and optimization of pretreatment steps of samples such as the disruption of the yeast cells and the hydrolysis of mannan and glycogen to obtain monosaccharides were carried out. Trehalose, glucose, and mannose were determined using high-performance liquid chromatography coupled with a refractive index detector and total carbohydrates were measured using the phenol-sulfuric method. Linear concentration range, accuracy, precision, LOD and LOQ were examined to check the reliability of the chromatographic method for each analyte.

  8. Predicting the cross-reactivities of polycyclic aromatic hydrocarbons in ELISA by regression analysis and CoMFA methods

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yan-Feng; Dai, Shu-Gui [College of Environmental Science and Engineering, Nankai University, Key Laboratory for Pollution Process and Environmental Criteria of Ministry of Education, Tianjin (China); Ma, Yi [College of Chemistry, Nankai University, Institute of Elemento-Organic Chemistry, Tianjin (China); Gao, Zhi-Xian [Institute of Hygiene and Environmental Medicine, Tianjin (China)

    2010-07-15

    Immunoassays have been regarded as a possible alternative or supplement for measuring polycyclic aromatic hydrocarbons (PAHs) in the environment. Since there are too many potential cross-reactants for PAH immunoassays, it is difficult to determine all the cross-reactivities (CRs) by experimental tests. The relationship between CR and the physical-chemical properties of PAHs and related compounds was investigated using the CR data from a commercial enzyme-linked immunosorbent assay (ELISA) kit test. Two quantitative structure-activity relationship (QSAR) techniques, regression analysis and comparative molecular field analysis (CoMFA), were applied for predicting the CR of PAHs in this ELISA kit. Parabolic regression indicates that the CRs are significantly correlated with the logarithm of the partition coefficient for the octanol-water system (log K{sub ow}) (r{sup 2}=0.643, n=23, P<0.0001), suggesting that hydrophobic interactions play an important role in the antigen-antibody binding and the cross-reactions in this ELISA test. The CoMFA model obtained shows that the CRs of the PAHs are correlated with the 3D structure of the molecules (r{sub cv}{sup 2}=0.663, r{sup 2}=0.873, F{sub 4,32}=55.086). The contributions of the steric and electrostatic fields to CR were 40.4 and 59.6%, respectively. Both of the QSAR models satisfactorily predict the CR in this PAH immunoassay kit, and help in understanding the mechanisms of antigen-antibody interaction. (orig.)

  9. Investigation of Pear Drying Performance by Different Methods and Regression of Convective Heat Transfer Coefficient with Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Mehmet Das

    2018-01-01

    Full Text Available In this study, an air heated solar collector (AHSC dryer was designed to determine the drying characteristics of the pear. Flat pear slices of 10 mm thickness were used in the experiments. The pears were dried both in the AHSC dryer and under the sun. Panel glass temperature, panel floor temperature, panel inlet temperature, panel outlet temperature, drying cabinet inlet temperature, drying cabinet outlet temperature, drying cabinet temperature, drying cabinet moisture, solar radiation, pear internal temperature, air velocity and mass loss of pear were measured at 30 min intervals. Experiments were carried out during the periods of June 2017 in Elazig, Turkey. The experiments started at 8:00 a.m. and continued till 18:00. The experiments were continued until the weight changes in the pear slices stopped. Wet basis moisture content (MCw, dry basis moisture content (MCd, adjustable moisture ratio (MR, drying rate (DR, and convective heat transfer coefficient (hc were calculated with both in the AHSC dryer and the open sun drying experiment data. It was found that the values of hc in both drying systems with a range 12.4 and 20.8 W/m2 °C. Three different kernel models were used in the support vector machine (SVM regression to construct the predictive model of the calculated hc values for both systems. The mean absolute error (MAE, root mean squared error (RMSE, relative absolute error (RAE and root relative absolute error (RRAE analysis were performed to indicate the predictive model’s accuracy. As a result, the rate of drying of the pear was examined for both systems and it was observed that the pear had dried earlier in the AHSC drying system. A predictive model was obtained using the SVM regression for the calculated hc values for the pear in the AHSC drying system. The normalized polynomial kernel was determined as the best kernel model in SVM for estimating the hc values.

  10. Collaborative regression.

    Science.gov (United States)

    Gross, Samuel M; Tibshirani, Robert

    2015-04-01

    We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem that can be implemented with off the shelf solvers, and illustrate their use on simulated and real data. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. Quantification of organic acids in beer by nuclear magnetic resonance (NMR)-based methods

    Energy Technology Data Exchange (ETDEWEB)

    Rodrigues, J.E.A. [CICECO-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Erny, G.L. [CESAM - Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Barros, A.S. [QOPNAA-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Esteves, V.I. [CESAM - Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Brandao, T.; Ferreira, A.A. [UNICER, Bebidas de Portugal, Leca do Balio, 4466-955 S. Mamede de Infesta (Portugal); Cabrita, E. [Department of Chemistry, New University of Lisbon, 2825-114 Caparica (Portugal); Gil, A.M., E-mail: agil@ua.pt [CICECO-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal)

    2010-08-03

    The organic acids present in beer provide important information on the product's quality and history, determining organoleptic properties and being useful indicators of fermentation performance. NMR spectroscopy may be used for rapid quantification of organic acids in beer and different NMR-based methodologies are hereby compared for the six main acids found in beer (acetic, citric, lactic, malic, pyruvic and succinic). The use of partial least squares (PLS) regression enables faster quantification, compared to traditional integration methods, and the performance of PLS models built using different reference methods (capillary electrophoresis (CE), both with direct and indirect UV detection, and enzymatic essays) was investigated. The best multivariate models were obtained using CE/indirect detection and enzymatic essays as reference and their response was compared with NMR integration, either using an internal reference or an electrical reference signal (Electronic REference To access In vivo Concentrations, ERETIC). NMR integration results generally agree with those obtained by PLS, with some overestimation for malic and pyruvic acids, probably due to peak overlap and subsequent integral errors, and an apparent relative underestimation for citric acid. Overall, these results make the PLS-NMR method an interesting choice for organic acid quantification in beer.

  12. Quantification of organic acids in beer by nuclear magnetic resonance (NMR)-based methods

    International Nuclear Information System (INIS)

    Rodrigues, J.E.A.; Erny, G.L.; Barros, A.S.; Esteves, V.I.; Brandao, T.; Ferreira, A.A.; Cabrita, E.; Gil, A.M.

    2010-01-01

    The organic acids present in beer provide important information on the product's quality and history, determining organoleptic properties and being useful indicators of fermentation performance. NMR spectroscopy may be used for rapid quantification of organic acids in beer and different NMR-based methodologies are hereby compared for the six main acids found in beer (acetic, citric, lactic, malic, pyruvic and succinic). The use of partial least squares (PLS) regression enables faster quantification, compared to traditional integration methods, and the performance of PLS models built using different reference methods (capillary electrophoresis (CE), both with direct and indirect UV detection, and enzymatic essays) was investigated. The best multivariate models were obtained using CE/indirect detection and enzymatic essays as reference and their response was compared with NMR integration, either using an internal reference or an electrical reference signal (Electronic REference To access In vivo Concentrations, ERETIC). NMR integration results generally agree with those obtained by PLS, with some overestimation for malic and pyruvic acids, probably due to peak overlap and subsequent integral errors, and an apparent relative underestimation for citric acid. Overall, these results make the PLS-NMR method an interesting choice for organic acid quantification in beer.

  13. A Method of Calculating Functional Independence Measure at Discharge from Functional Independence Measure Effectiveness Predicted by Multiple Regression Analysis Has a High Degree of Predictive Accuracy.

    Science.gov (United States)

    Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru

    2017-09-01

    Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.

  14. Improved ability of biological and previous caries multimarkers to predict caries disease as revealed by multivariate PLS modelling

    Directory of Open Access Journals (Sweden)

    Ericson Thorild

    2009-11-01

    Full Text Available Abstract Background Dental caries is a chronic disease with plaque bacteria, diet and saliva modifying disease activity. Here we have used the PLS method to evaluate a multiplicity of such biological variables (n = 88 for ability to predict caries in a cross-sectional (baseline caries and prospective (2-year caries development setting. Methods Multivariate PLS modelling was used to associate the many biological variables with caries recorded in thirty 14-year-old children by measuring the numbers of incipient and manifest caries lesions at all surfaces. Results A wide but shallow gliding scale of one fifth caries promoting or protecting, and four fifths non-influential, variables occurred. The influential markers behaved in the order of plaque bacteria > diet > saliva, with previously known plaque bacteria/diet markers and a set of new protective diet markers. A differential variable patterning appeared for new versus progressing lesions. The influential biological multimarkers (n = 18 predicted baseline caries better (ROC area 0.96 than five markers (0.92 and a single lactobacilli marker (0.7 with sensitivity/specificity of 1.87, 1.78 and 1.13 at 1/3 of the subjects diagnosed sick, respectively. Moreover, biological multimarkers (n = 18 explained 2-year caries increment slightly better than reported before but predicted it poorly (ROC area 0.76. By contrast, multimarkers based on previous caries predicted alone (ROC area 0.88, or together with biological multimarkers (0.94, increment well with a sensitivity/specificity of 1.74 at 1/3 of the subjects diagnosed sick. Conclusion Multimarkers behave better than single-to-five markers but future multimarker strategies will require systematic searches for improved saliva and plaque bacteria markers.

  15. A method for the selection of a functional form for a thermodynamic equation of state using weighted linear least squares stepwise regression

    Science.gov (United States)

    Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.

    1976-01-01

    A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.

  16. Multivariate regression models for the simultaneous quantitative analysis of calcium and magnesium carbonates and magnesium oxide through drifts data

    Directory of Open Access Journals (Sweden)

    Marder Luciano

    2006-01-01

    Full Text Available In the present work multivariate regression models were developed for the quantitative analysis of ternary systems using Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS to determine the concentration in weight of calcium carbonate, magnesium carbonate and magnesium oxide. Nineteen spectra of standard samples previously defined in ternary diagram by mixture design were prepared and mid-infrared diffuse reflectance spectra were recorded. The partial least squares (PLS regression method was applied to the model. The spectra set was preprocessed by either mean-centered and variance-scaled (model 2 or mean-centered only (model 1. The results based on the prediction performance of the external validation set expressed by RMSEP (root mean square error of prediction demonstrated that it is possible to develop good models to simultaneously determine calcium carbonate, magnesium carbonate and magnesium oxide content in powdered samples that can be used in the study of the thermal decomposition of dolomite rocks.

  17. Configuring calendar variation based on time series regression method for forecasting of monthly currency inflow and outflow in Central Java

    Science.gov (United States)

    Setiawan, Suhartono, Ahmad, Imam Safawi; Rahmawati, Noorgam Ika

    2015-12-01

    Bank Indonesia (BI) as the central bank of Republic Indonesiahas a single overarching objective to establish and maintain rupiah stability. This objective could be achieved by monitoring traffic of inflow and outflow money currency. Inflow and outflow are related to stock and distribution of money currency around Indonesia territory. It will effect of economic activities. Economic activities of Indonesia,as one of Moslem country, absolutely related to Islamic Calendar (lunar calendar), that different with Gregorian calendar. This research aims to forecast the inflow and outflow money currency of Representative Office (RO) of BI Semarang Central Java region. The results of the analysis shows that the characteristics of inflow and outflow money currency influenced by the effects of the calendar variations, that is the day of Eid al-Fitr (moslem holyday) as well as seasonal patterns. In addition, the period of a certain week during Eid al-Fitr also affect the increase of inflow and outflow money currency. The best model based on the value of the smallestRoot Mean Square Error (RMSE) for inflow data is ARIMA model. While the best model for predicting the outflow data in RO of BI Semarang is ARIMAX model or Time Series Regression, because both of them have the same model. The results forecast in a period of 2015 shows an increase of inflow money currency happened in August, while the increase in outflow money currency happened in July.

  18. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  19. Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression.

    Science.gov (United States)

    Bry, X; Verron, T; Cazes, P

    2009-05-29

    In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.

  20. Detecting sea-level hazards: Simple regression-based methods for calculating the acceleration of sea level

    Science.gov (United States)

    Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.

    2016-01-04

    This report documents the development of statistical tools used to quantify the hazard presented by the response of sea-level elevation to natural or anthropogenic changes in climate and ocean circulation. A hazard is a physical process (or processes) that, when combined with vulnerability (or susceptibility to the hazard), results in risk. This study presents the development and comparison of new and existing sea-level analysis methods, exploration of the strengths and weaknesses of the methods using synthetic time series, and when appropriate, synthesis of the application of the method to observed sea-level time series. These reports are intended to enhance material presented in peer-reviewed journal articles where it is not always possible to provide the level of detail that might be necessary to fully support or recreate published results.

  1. Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery

    International Nuclear Information System (INIS)

    Hu, Chao; Jain, Gaurav; Zhang, Puqiang; Schmidt, Craig; Gomadam, Parthasarathy; Gorka, Tom

    2014-01-01

    Highlights: • We develop a data-driven method for the battery capacity estimation. • Five charge-related features that are indicative of the capacity are defined. • The kNN regression model captures the dependency of the capacity on the features. • Results with 10 years’ continuous cycling data verify the effectiveness of the method. - Abstract: Reliability of lithium-ion (Li-ion) rechargeable batteries used in implantable medical devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, physicians, and patients. To ensure Li-ion batteries in these devices operate reliably, it is important to be able to assess the battery health condition by estimating the battery capacity over the life-time. This paper presents a data-driven method for estimating the capacity of Li-ion battery based on the charge voltage and current curves. The contributions of this paper are three-fold: (i) the definition of five characteristic features of the charge curves that are indicative of the capacity, (ii) the development of a non-linear kernel regression model, based on the k-nearest neighbor (kNN) regression, that captures the complex dependency of the capacity on the five features, and (iii) the adaptation of particle swarm optimization (PSO) to finding the optimal combination of feature weights for creating a kNN regression model that minimizes the cross validation (CV) error in the capacity estimation. Verification with 10 years’ continuous cycling data suggests that the proposed method is able to accurately estimate the capacity of Li-ion battery throughout the whole life-time

  2. A regression-based method for mapping traffic-related air pollution. Application and testing in four contrasting urban environments

    International Nuclear Information System (INIS)

    Briggs, D.J.; De Hoogh, C.; Elliot, P.; Gulliver, J.; Wills, J.; Kingham, S.; Smallbone, K.

    2000-01-01

    Accurate, high-resolution maps of traffic-related air pollution are needed both as a basis for assessing exposures as part of epidemiological studies, and to inform urban air-quality policy and traffic management. This paper assesses the use of a GIS-based, regression mapping technique to model spatial patterns of traffic-related air pollution. The model - developed using data from 80 passive sampler sites in Huddersfield, as part of the SAVIAH (Small Area Variations in Air Quality and Health) project - uses data on traffic flows and land cover in the 300-m buffer zone around each site, and altitude of the site, as predictors of NO 2 concentrations. It was tested here by application in four urban areas in the UK: Huddersfield (for the year following that used for initial model development), Sheffield, Northampton, and part of London. In each case, a GIS was built in ArcInfo, integrating relevant data on road traffic, urban land use and topography. Monitoring of NO 2 was undertaken using replicate passive samplers (in London, data were obtained from surveys carried out as part of the London network). In Huddersfield, Sheffield and Northampton, the model was first calibrated by comparing modelled results with monitored NO 2 concentrations at 10 randomly selected sites; the calibrated model was then validated against data from a further 10-28 sites. In London, where data for only 11 sites were available, validation was not undertaken. Results showed that the model performed well in all cases. After local calibration, the model gave estimates of mean annual NO 2 concentrations within a factor of 1.5 of the actual mean (approx. 70-90%) of the time and within a factor of 2 between 70 and 100% of the time. r 2 values between modelled and observed concentrations are in the range of 0.58-0.76. These results are comparable to those achieved by more sophisticated dispersion models. The model also has several advantages over dispersion modelling. It is able, for example, to

  3. Instrumentation and control system for PLS-IM-T 60 MeV LINAC

    International Nuclear Information System (INIS)

    Liu, D.K.; Yei, K.R.; Cheng, H.J.

    1992-01-01

    The PLSIMT is a 60 MeV LINAC as a preinjector for 2 GeV LINAC of PLS project. The instrumentation and control system have been designed under the institutional collaboration between the IHEP (Beijing, China) and POSTECH (Pohang, Korea). So far, the I and C system are being set up nowadays at the POSTECH of Pohang. This paper describes its major characteristics and present status. (author)

  4. Klystron-modulator system availability of PLS 2 GeV electron linac

    International Nuclear Information System (INIS)

    Cho, M.H.; Park, S.S.; Oh, J.S.; Namkung, W.

    1996-01-01

    PLS Linac has been injecting 2 GeV electron beams to the Pohang Light Source (PLS) storage ring since September 1994. PLS 2 GeV linac employs 11 sets of high power klystron-modulator (K and M) system for the main RF source for the beam acceleration. The klystron has rated output peak power of 80 MW at 4 microsec pulse width and at 60 pps. The matching modulator has 200 MW peak output power. The total accumulated high voltage run time of the oldest unit has reached beyond 23,000 hour and the sum of all the high voltage run time is approximately 230,000 hour as of May 1996. In this paper, we review overall system performance of the high-power K and M system. A special attention is paid on the analysis of all failures and troubles of the K and M system which affected the linac high power RF operations as well as beam injection operations for the period of 1994 to May 1996. (author)

  5. Effects of global signal regression and subtraction methods on resting-state functional connectivity using arterial spin labeling data.

    Science.gov (United States)

    Silva, João Paulo Santos; Mônaco, Luciana da Mata; Paschoal, André Monteiro; Oliveira, Ícaro Agenor Ferreira de; Leoni, Renata Ferranti

    2018-05-16

    Arterial spin labeling (ASL) is an established magnetic resonance imaging (MRI) technique that is finding broader applications in functional studies of the healthy and diseased brain. To promote improvement in cerebral blood flow (CBF) signal specificity, many algorithms and imaging procedures, such as subtraction methods, were proposed to eliminate or, at least, minimize noise sources. Therefore, this study addressed the main considerations of how CBF functional connectivity (FC) is changed, regarding resting brain network (RBN) identification and correlations between regions of interest (ROI), by different subtraction methods and removal of residual motion artifacts and global signal fluctuations (RMAGSF). Twenty young healthy participants (13 M/7F, mean age = 25 ± 3 years) underwent an MRI protocol with a pseudo-continuous ASL (pCASL) sequence. Perfusion-based images were obtained using simple, sinc and running subtraction. RMAGSF removal was applied to all CBF time series. Independent Component Analysis (ICA) was used for RBN identification, while Pearson' correlation was performed for ROI-based FC analysis. Temporal signal-to-noise ratio (tSNR) was higher in CBF maps obtained by sinc subtraction, although RMAGSF removal had a significant effect on maps obtained with simple and running subtractions. Neither the subtraction method nor the RMAGSF removal directly affected the identification of RBNs. However, the number of correlated and anti-correlated voxels varied for different subtraction and filtering methods. In an ROI-to-ROI level, changes were prominent in FC values and their statistical significance. Our study showed that both RMAGSF filtering and subtraction method might influence resting-state FC results, especially in an ROI level, consequently affecting FC analysis and its interpretation. Taking our results and the whole discussion together, we understand that for an exploratory assessment of the brain, one could avoid removing RMAGSF to

  6. Current Mathematical Methods Used in QSAR/QSPR Studies

    Directory of Open Access Journals (Sweden)

    Peixun Liu

    2009-04-01

    Full Text Available This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP, Project Pursuit Regression (PPR and Local Lazy Regression (LLR have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR, Partial Least Squares (PLS, Neural Networks (NN, Support Vector Machine (SVM and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future.

  7. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  8. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  9. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  10. Two-step superresolution approach for surveillance face image through radial basis function-partial least squares regression and locality-induced sparse representation

    Science.gov (United States)

    Jiang, Junjun; Hu, Ruimin; Han, Zhen; Wang, Zhongyuan; Chen, Jun

    2013-10-01

    Face superresolution (SR), or face hallucination, refers to the technique of generating a high-resolution (HR) face image from a low-resolution (LR) one with the help of a set of training examples. It aims at transcending the limitations of electronic imaging systems. Applications of face SR include video surveillance, in which the individual of interest is often far from cameras. A two-step method is proposed to infer a high-quality and HR face image from a low-quality and LR observation. First, we establish the nonlinear relationship between LR face images and HR ones, according to radial basis function and partial least squares (RBF-PLS) regression, to transform the LR face into the global face space. Then, a locality-induced sparse representation (LiSR) approach is presented to enhance the local facial details once all the global faces for each LR training face are constructed. A comparison of some state-of-the-art SR methods shows the superiority of the proposed two-step approach, RBF-PLS global face regression followed by LiSR-based local patch reconstruction. Experiments also demonstrate the effectiveness under both simulation conditions and some real conditions.

  11. Empirical estimation of the grades of hearing impairment among industrial workers based on new artificial neural networks and classical regression methods.

    Science.gov (United States)

    Farhadian, Maryam; Aliabadi, Mohsen; Darvishi, Ebrahim

    2015-01-01

    Prediction models are used in a variety of medical domains, and they are frequently built from experience which constitutes data acquired from actual cases. This study aimed to analyze the potential of artificial neural networks and logistic regression techniques for estimation of hearing impairment among industrial workers. A total of 210 workers employed in a steel factory (in West of Iran) were selected, and their occupational exposure histories were analyzed. The hearing loss thresholds of the studied workers were determined using a calibrated audiometer. The personal noise exposures were also measured using a noise dosimeter in the workstations. Data obtained from five variables, which can influence the hearing loss, were used as input features, and the hearing loss thresholds were considered as target feature of the prediction methods. Multilayer feedforward neural networks and logistic regression were developed using MATLAB R2011a software. Based on the World Health Organization classification for the grades of hearing loss, 74.2% of the studied workers have normal hearing thresholds, 23.4% have slight hearing loss, and 2.4% have moderate hearing loss. The accuracy and kappa coefficient of the best developed neural networks for prediction of the grades of hearing loss were 88.6 and 66.30, respectively. The accuracy and kappa coefficient of the logistic regression were also 84.28 and 51.30, respectively. Neural networks could provide more accurate predictions of the hearing loss than logistic regression. The prediction method can provide reliable and comprehensible information for occupational health and medicine experts.

  12. Application of nonparametric regression methods to study the relationship between NO2 concentrations and local wind direction and speed at background sites.

    Science.gov (United States)

    Donnelly, Aoife; Misstear, Bruce; Broderick, Brian

    2011-02-15

    Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling

  13. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  14. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  15. Effects of categorization method, regression type, and variable distribution on the inflation of Type-I error rate when categorizing a confounding variable.

    Science.gov (United States)

    Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A

    2015-03-15

    The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.

  16. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  17. Differentiating regressed melanoma from regressed lichenoid keratosis.

    Science.gov (United States)

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  18. Multiple Linear Regression Modeling To Predict the Stability of Polymer-Drug Solid Dispersions: Comparison of the Effects of Polymers and Manufacturing Methods on Solid Dispersion Stability.

    Science.gov (United States)

    Fridgeirsdottir, Gudrun A; Harris, Robert J; Dryden, Ian L; Fischer, Peter M; Roberts, Clive J

    2018-03-29

    Solid dispersions can be a successful way to enhance the bioavailability of poorly soluble drugs. Here 60 solid dispersion formulations were produced using ten chemically diverse, neutral, poorly soluble drugs, three commonly used polymers, and two manufacturing techniques, spray-drying and melt extrusion. Each formulation underwent a six-month stability study at accelerated conditions, 40 °C and 75% relative humidity (RH). Significant differences in times to crystallization (onset of crystallization) were observed between both the different polymers and the two processing methods. Stability from zero days to over one year was observed. The extensive experimental data set obtained from this stability study was used to build multiple linear regression models to correlate physicochemical properties of the active pharmaceutical ingredients (API) with the stability data. The purpose of these models is to indicate which combination of processing method and polymer carrier is most likely to give a stable solid dispersion. Six quantitative mathematical multiple linear regression-based models were produced based on selection of the most influential independent physical and chemical parameters from a set of 33 possible factors, one model for each combination of polymer and processing method, with good predictability of stability. Three general rules are proposed from these models for the formulation development of suitably stable solid dispersions. Namely, increased stability is correlated with increased glass transition temperature ( T g ) of solid dispersions, as well as decreased number of H-bond donors and increased molecular flexibility (such as rotatable bonds and ring count) of the drug molecule.

  19. A comparison between univariate probabilistic and multivariate (logistic regression) methods for landslide susceptibility analysis: the example of the Febbraro valley (Northern Alps, Italy)

    Science.gov (United States)

    Rossi, M.; Apuani, T.; Felletti, F.

    2009-04-01

    The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9

  20. Towards a user-friendly brain-computer interface: initial tests in ALS and PLS patients.

    Science.gov (United States)

    Bai, Ou; Lin, Peter; Huang, Dandan; Fei, Ding-Yu; Floeter, Mary Kay

    2010-08-01

    Patients usually require long-term training for effective EEG-based brain-computer interface (BCI) control due to fatigue caused by the demands for focused attention during prolonged BCI operation. We intended to develop a user-friendly BCI requiring minimal training and less mental load. Testing of BCI performance was investigated in three patients with amyotrophic lateral sclerosis (ALS) and three patients with primary lateral sclerosis (PLS), who had no previous BCI experience. All patients performed binary control of cursor movement. One ALS patient and one PLS patient performed four-directional cursor control in a two-dimensional domain under a BCI paradigm associated with human natural motor behavior using motor execution and motor imagery. Subjects practiced for 5-10min and then participated in a multi-session study of either binary control or four-directional control including online BCI game over 1.5-2h in a single visit. Event-related desynchronization and event-related synchronization in the beta band were observed in all patients during the production of voluntary movement either by motor execution or motor imagery. The online binary control of cursor movement was achieved with an average accuracy about 82.1+/-8.2% with motor execution and about 80% with motor imagery, whereas offline accuracy was achieved with 91.4+/-3.4% with motor execution and 83.3+/-8.9% with motor imagery after optimization. In addition, four-directional cursor control was achieved with an accuracy of 50-60% with motor execution and motor imagery. Patients with ALS or PLS may achieve BCI control without extended training, and fatigue might be reduced during operation of a BCI associated with human natural motor behavior. The development of a user-friendly BCI will promote practical BCI applications in paralyzed patients. Copyright 2010 International Federation of Clinical Neurophysiology. All rights reserved.

  1. Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey)

    Science.gov (United States)

    Ozdemir, Adnan

    2011-07-01

    SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model

  2. PLS Torino: A way to discover semiconductors in a school lab

    International Nuclear Information System (INIS)

    Marzolla, F.

    2015-01-01

    In the wide range of PLS activities, one on semiconductors was realized with high-school 4th- and 5th-year students. After an introduction on semiconductor and electromagnetic radiation concepts, students assembled circuits, observed photoresistor and LED behavior and compared experimental and theoretical results. We especially paid attention to energy conversions and devices applications. An important point of the project is that it can be easily realized in our schools because low-cost devices are used. Moreover, discussing experimental results, it is possible to correct or complete students phenomena interpretation.

  3. A study for lattice comparison for PLS 2 GeV storage ring

    International Nuclear Information System (INIS)

    Yoon, M.

    1991-01-01

    TBA and DBA lattices are compared for 1.5-2.5 GeV synchrotron light source, with particular attention to the PLS 2 GeV electron storage ring currently being developed in Pohang, Korea. For the comparison study, the optimum electron energy was chosen to be 2 GeV and the circumference of the ring is less than 280.56 m, the natural beam emittance no greater than 13 nm. Results from various linear and nonlinear optics comparison studies are presented

  4. Application of Genetic Algorithm (GA) Assisted Partial Least Square (PLS) Analysis on Trilinear and Non-trilinear Fluorescence Data Sets to Quantify the Fluorophores in Multifluorophoric Mixtures: Improving Quantification Accuracy of Fluorimetric Estimations of Dilute Aqueous Mixtures.

    Science.gov (United States)

    Kumar, Keshav

    2018-03-29

    Excitation-emission matrix fluorescence (EEMF) and total synchronous fluorescence spectroscopy (TSFS) are the 2 fluorescence techniques that are commonly used for the analysis of multifluorophoric mixtures. These 2 fluorescence techniques are conceptually different and provide certain advantages over each other. The manual analysis of such highly correlated large volume of EEMF and TSFS towards developing a calibration model is difficult. Partial least square (PLS) analysis can analyze the large volume of EEMF and TSFS data sets by finding important factors that maximize the correlation between the spectral and concentration information for each fluorophore. However, often the application of PLS analysis on entire data sets does not provide a robust calibration model and requires application of suitable pre-processing step. The present work evaluates the application of genetic algorithm (GA) analysis prior to PLS analysis on EEMF and TSFS data sets towards improving the precision and accuracy of the calibration model. The GA algorithm essentially combines the advantages provided by stochastic methods with those provided by deterministic approaches and can find the set of EEMF and TSFS variables that perfectly correlate well with the concentration of each of the fluorophores present in the multifluorophoric mixtures. The utility of the GA assisted PLS analysis is successfully validated using (i) EEMF data sets acquired for dilute aqueous mixture of four biomolecules and (ii) TSFS data sets acquired for dilute aqueous mixtures of four carcinogenic polycyclic aromatic hydrocarbons (PAHs) mixtures. In the present work, it is shown that by using the GA it is possible to significantly improve the accuracy and precision of the PLS calibration model developed for both EEMF and TSFS data set. Hence, GA must be considered as a useful pre-processing technique while developing an EEMF and TSFS calibration model.

  5. Direct-on-Filter α-Quartz Estimation in Respirable Coal Mine Dust Using Transmission Fourier Transform Infrared Spectrometry and Partial Least Squares Regression.

    Science.gov (United States)

    Miller, Arthur L; Weakley, Andrew Todd; Griffiths, Peter R; Cauda, Emanuele G; Bayman, Sean

    2017-05-01

    In order to help reduce silicosis in miners, the National Institute for Occupational Health and Safety (NIOSH) is developing field-portable methods for measuring airborne respirable crystalline silica (RCS), specifically the polymorph α-quartz, in mine dusts. In this study we demonstrate the feasibility of end-of-shift measurement of α-quartz using a direct-on-filter (DoF) method to analyze coal mine dust samples deposited onto polyvinyl chloride filters. The DoF method is potentially amenable for on-site analyses, but deviates from the current regulatory determination of RCS for coal mines by eliminating two sample preparation steps: ashing the sampling filter and redepositing the ash prior to quantification by Fourier transform infrared (FT-IR) spectrometry. In this study, the FT-IR spectra of 66 coal dust samples from active mines were used, and the RCS was quantified by using: (1) an ordinary least squares (OLS) calibration approach that utilizes standard silica material as done in the Mine Safety and Health Administration's P7 method; and (2) a partial least squares (PLS) regression approach. Both were capable of accounting for kaolinite, which can confound the IR analysis of silica. The OLS method utilized analytical standards for silica calibration and kaolin correction, resulting in a good linear correlation with P7 results and minimal bias but with the accuracy limited by the presence of kaolinite. The PLS approach also produced predictions well-correlated to the P7 method, as well as better accuracy in RCS prediction, and no bias due to variable kaolinite mass. Besides decreased sensitivity to mineral or substrate confounders, PLS has the advantage that the analyst is not required to correct for the presence of kaolinite or background interferences related to the substrate, making the method potentially viable for automated RCS prediction in the field. This study demonstrated the efficacy of FT-IR transmission spectrometry for silica determination in

  6. Regression: A Bibliography.

    Science.gov (United States)

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  7. Robust methods for multivariate data analysis A1

    DEFF Research Database (Denmark)

    Frosch, Stina; Von Frese, J.; Bro, Rasmus

    2005-01-01

    Outliers may hamper proper classical multivariate analysis, and lead to incorrect conclusions. To remedy the problem of outliers, robust methods are developed in statistics and chemometrics. Robust methods reduce or remove the effect of outlying data points and allow the ?good? data to primarily...... determine the result. This article reviews the most commonly used robust multivariate regression and exploratory methods that have appeared since 1996 in the field of chemometrics. Special emphasis is put on the robust versions of chemometric standard tools like PCA and PLS and the corresponding robust...

  8. A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey

    Science.gov (United States)

    Ozdemir, Adnan; Altural, Tolga

    2013-03-01

    This study evaluated and compared landslide susceptibility maps produced with three different methods, frequency ratio, weights of evidence, and logistic regression, by using validation datasets. The field surveys performed as part of this investigation mapped the locations of 90 landslides that had been identified in the Sultan Mountains of south-western Turkey. The landslide influence parameters used for this study are geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transportation capacity index, distance to drainage, distance to fault, drainage density, fault density, and spring density maps. The relationships between landslide distributions and these parameters were analysed using the three methods, and the results of these methods were then used to calculate the landslide susceptibility of the entire study area. The accuracy of the final landslide susceptibility maps was evaluated based on the landslides observed during the fieldwork, and the accuracy of the models was evaluated by calculating each model's relative operating characteristic curve. The predictive capability of each model was determined from the area under the relative operating characteristic curve and the areas under the curves obtained using the frequency ratio, logistic regression, and weights of evidence methods are 0.976, 0.952, and 0.937, respectively. These results indicate that the frequency ratio and weights of evidence models are relatively good estimators of landslide susceptibility in the study area. Specifically, the results of the correlation analysis show a high correlation between the frequency ratio and weights of evidence results, and the frequency ratio and logistic regression methods exhibit correlation coefficients of 0.771 and 0.727, respectively. The frequency ratio model is simple, and its input, calculation and output processes are

  9. Influence of the nature of soil organic matter on the sorption behaviour of pentadecane as determined by PLS analysis of mid-infrared DRIFT and solid-state {sup 13}C NMR spectra

    Energy Technology Data Exchange (ETDEWEB)

    Clark Ehlers, G.A. [Institute of Environmental Biotechnology, Department IFA-Tulln, University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Forrester, Sean T. [CSIRO Land and Water, Waite Rd, Urrbrae SA 5064 (Australia); Scherr, Kerstin E. [Institute of Environmental Biotechnology, Department IFA-Tulln, University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Loibner, Andreas P., E-mail: andreas.loibner@boku.ac.a [Institute of Environmental Biotechnology, Department IFA-Tulln, The University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Janik, Les J. [CSIRO Land and Water, Waite Rd, Urrbrae SA 5064 (Australia)

    2010-01-15

    The nature of soil organic matter (SOM) functional groups associated with sorption processes was determined by correlating partitioning coefficients with solid-state {sup 13}C nuclear magnetic resonance (NMR) and diffuse reflectance mid-infrared (DRIFT) spectral features using partial least squares (PLS) regression analysis. Partitioning sorption coefficients for n-pentadecane (n-C{sub 15}) were determined for three alternative models: the Langmuir model, the dual distributed reactive domain model (DRDM) and the Freundlich model, where the latter was found to be the most appropriate. NMR-derived constitutional descriptors did not correlate with Freundlich model parameters. By contrast, PLS analysis revealed the most likely nature of the functional groups in SOM associated with n-C{sub 15} sorption coefficients (K{sub F}) to be aromatic, possibly porous soil char, rather than aliphatic organic components for the presently investigated soils. High PLS cross-validation correlation suggested that the model was robust for the purpose of characterising the functional group chemistry important for n-C{sub 15} sorption. - NMR/IR spectroscopy and chemometrics reveal the aromatic fraction of soil organic matter being responsible for alkane sorption.

  10. Subset selection in regression

    CERN Document Server

    Miller, Alan

    2002-01-01

    Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...

  11. Projects Delay Factors of Saudi Arabia Construction Industry Using PLS-SEM Path Modelling Approach

    Directory of Open Access Journals (Sweden)

    Abdul Rahman Ismail

    2016-01-01

    Full Text Available This paper presents the development of PLS-SEM Path Model of delay factors of Saudi Arabia construction industry focussing on Mecca City. The model was developed and assessed using SmartPLS v3.0 software and it consists of 37 factors/manifests in 7 groups/independent variables and one dependent variable which is delay of the construction projects. The model was rigorously assessed at measurement and structural components and the outcomes found that the model has achieved the required threshold values. At structural level of the model, among the seven groups, the client and consultant group has the highest impact on construction delay with path coefficient β-value of 0.452 and the project management and contract administration group is having the least impact to the construction delay with β-value of 0.016. The overall model has moderate explaining power ability with R2 value of 0.197 for Saudi Arabia construction industry representation. This model will able to assist practitioners in Mecca city to pay more attention in risk analysis for potential construction delay.

  12. Assessing a moderating effect and the global fit of a PLS model on online trading

    Directory of Open Access Journals (Sweden)

    Juan J. García-Machado

    2017-12-01

    Full Text Available This paper proposes a PLS Model for the study of Online Trading. Traditional investing has experienced a revolution due to the rise of e-trading services that enable investors to use Internet conduct secure trading. On the hand, model results show that there is a positive, direct and statistically significant relationship between personal outcome expectations, perceived relative advantage, shared vision and economy-based trust with the quality of knowledge. On the other hand, trading frequency and portfolio performance has also this relationship. After including the investor’s income and financial wealth (IFW as moderating effect, the PLS model was enhanced, and we found that the interaction term is negative and statistically significant, so, higher IFW levels entail a weaker relationship between trading frequency and portfolio performance and vice-versa. Finally, with regard to the goodness of overall model fit measures, they showed that the model is fit for SRMR and dG measures, so it is likely that the model is true.

  13. Quantification of endocrine disruptors and pesticides in water by gas chromatography-tandem mass spectrometry. Method validation using weighted linear regression schemes.

    Science.gov (United States)

    Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P

    2010-10-22

    A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.

  14. Influence of regression model and incremental test protocol on the relationship between lactate threshold using the maximal-deviation method and performance in female runners.

    Science.gov (United States)

    Machado, Fabiana Andrade; Nakamura, Fábio Yuzo; Moraes, Solange Marta Franzói De

    2012-01-01

    This study examined the influence of the regression model and initial intensity of an incremental test on the relationship between the lactate threshold estimated by the maximal-deviation method and the endurance performance. Sixteen non-competitive, recreational female runners performed a discontinuous incremental treadmill test. The initial speed was set at 7 km · h⁻¹, and increased every 3 min by 1 km · h⁻¹ with a 30-s rest between the stages used for earlobe capillary blood sample collection. Lactate-speed data were fitted by an exponential-plus-constant and a third-order polynomial equation. The lactate threshold was determined for both regression equations, using all the coordinates, excluding the first and excluding the first and second initial points. Mean speed of a 10-km road race was the performance index (3.04 ± 0.22 m · s⁻¹). The exponentially-derived lactate threshold had a higher correlation (0.98 ≤ r ≤ 0.99) and smaller standard error of estimate (SEE) (0.04 ≤ SEE ≤ 0.05 m · s⁻¹) with performance than the polynomially-derived equivalent (0.83 ≤ r ≤ 0.89; 0.10 ≤ SEE ≤ 0.13 m · s⁻¹). The exponential lactate threshold was greater than the polynomial equivalent (P performance index that is independent of the initial intensity of the incremental test and better than the polynomial equivalent.

  15. Steganalysis using logistic regression

    Science.gov (United States)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  16. Influence of regression model and initial intensity of an incremental test on the relationship between the lactate threshold estimated by the maximal-deviation method and running performance.

    Science.gov (United States)

    Santos-Concejero, Jordan; Tucker, Ross; Granados, Cristina; Irazusta, Jon; Bidaurrazaga-Letona, Iraia; Zabala-Lili, Jon; Gil, Susana María

    2014-01-01

    This study investigated the influence of the regression model and initial intensity during an incremental test on the relationship between the lactate threshold estimated by the maximal-deviation method and performance in elite-standard runners. Twenty-three well-trained runners completed a discontinuous incremental running test on a treadmill. Speed started at 9 km · h(-1) and increased by 1.5 km · h(-1) every 4 min until exhaustion, with a minute of recovery for blood collection. Lactate-speed data were fitted by exponential and polynomial models. The lactate threshold was determined for both models, using all the co-ordinates, excluding the first and excluding the first and second points. The exponential lactate threshold was greater than the polynomial equivalent in any co-ordinate condition (P performance and is independent of the initial intensity of the test.

  17. Factors influencing superimposition error of 3D cephalometric landmarks by plane orientation method using 4 reference points: 4 point superimposition error regression model.

    Science.gov (United States)

    Hwang, Jae Joon; Kim, Kee-Deog; Park, Hyok; Park, Chang Seo; Jeong, Ho-Gul

    2014-01-01

    Superimposition has been used as a method to evaluate the changes of orthodontic or orthopedic treatment in the dental field. With the introduction of cone beam CT (CBCT), evaluating 3 dimensional changes after treatment became possible by superimposition. 4 point plane orientation is one of the simplest ways to achieve superimposition of 3 dimensional images. To find factors influencing superimposition error of cephalometric landmarks by 4 point plane orientation method and to evaluate the reproducibility of cephalometric landmarks for analyzing superimposition error, 20 patients were analyzed who had normal skeletal and occlusal relationship and took CBCT for diagnosis of temporomandibular disorder. The nasion, sella turcica, basion and midpoint between the left and the right most posterior point of the lesser wing of sphenoidal bone were used to define a three-dimensional (3D) anatomical reference co-ordinate system. Another 15 reference cephalometric points were also determined three times in the same image. Reorientation error of each landmark could be explained substantially (23%) by linear regression model, which consists of 3 factors describing position of each landmark towards reference axes and locating error. 4 point plane orientation system may produce an amount of reorientation error that may vary according to the perpendicular distance between the landmark and the x-axis; the reorientation error also increases as the locating error and shift of reference axes viewed from each landmark increases. Therefore, in order to reduce the reorientation error, accuracy of all landmarks including the reference points is important. Construction of the regression model using reference points of greater precision is required for the clinical application of this model.

  18. Using reduced rank regression methods to identify dietary patterns associated with obesity: a cross-country study among European and Australian adolescents.

    Science.gov (United States)

    Huybrechts, Inge; Lioret, Sandrine; Mouratidou, Theodora; Gunter, Marc J; Manios, Yannis; Kersting, Mathilde; Gottrand, Frederic; Kafatos, Anthony; De Henauw, Stefaan; Cuenca-García, Magdalena; Widhalm, Kurt; Gonzales-Gross, Marcela; Molnar, Denes; Moreno, Luis A; McNaughton, Sarah A

    2017-01-01

    This study aims to examine repeatability of reduced rank regression (RRR) methods in calculating dietary patterns (DP) and cross-sectional associations with overweight (OW)/obesity across European and Australian samples of adolescents. Data from two cross-sectional surveys in Europe (2006/2007 Healthy Lifestyle in Europe by Nutrition in Adolescence study, including 1954 adolescents, 12-17 years) and Australia (2007 National Children's Nutrition and Physical Activity Survey, including 1498 adolescents, 12-16 years) were used. Dietary intake was measured using two non-consecutive, 24-h recalls. RRR was used to identify DP using dietary energy density, fibre density and percentage of energy intake from fat as the intermediate variables. Associations between DP scores and body mass/fat were examined using multivariable linear and logistic regression as appropriate, stratified by sex. The first DP extracted (labelled 'energy dense, high fat, low fibre') explained 47 and 31 % of the response variation in Australian and European adolescents, respectively. It was similar for European and Australian adolescents and characterised by higher consumption of biscuits/cakes, chocolate/confectionery, crisps/savoury snacks, sugar-sweetened beverages, and lower consumption of yogurt, high-fibre bread, vegetables and fresh fruit. DP scores were inversely associated with BMI z-scores in Australian adolescent boys and borderline inverse in European adolescent boys (so as with %BF). Similarly, a lower likelihood for OW in boys was observed with higher DP scores in both surveys. No such relationships were observed in adolescent girls. In conclusion, the DP identified in this cross-country study was comparable for European and Australian adolescents, demonstrating robustness of the RRR method in calculating DP among populations. However, longitudinal designs are more relevant when studying diet-obesity associations, to prevent reverse causality.

  19. A primer of statistical methods for correlating parameters and properties of electrospun poly( l -lactide) scaffolds for tissue engineering-PART 2: Regression

    KAUST Repository

    Seyedmahmoud, Rasoul

    2014-04-07

    This two-articles series presents an in-depth discussion of electrospun poly-l-lactide scaffolds for tissue engineering by means of statistical methodologies that can be used, in general, to gain a quantitative and systematic insight about effects and interactions between a handful of key scaffold properties (Ys) and a set of process parameters (Xs) in electrospinning. While Part-1 dealt with the DOE methods to unveil the interactions between Xs in determining the morphomechanical properties (ref. Y1-4), this Part-2 article continues and refocuses the discussion on the interdependence of scaffold properties investigated by standard regression methods. The discussion first explores the connection between mechanical properties (Y4) and morphological descriptors of the scaffolds (Y1-3) in 32 types of scaffolds, finding that the mean fiber diameter (Y1) plays a predominant role which is nonetheless and crucially modulated by the molecular weight (MW) of PLLA. The second part examines the biological performance (Y5) (i.e. the cell proliferation of seeded bone marrow-derived mesenchymal stromal cells) on a random subset of eight scaffolds vs. the mechanomorphological properties (Y1-4). In this case, the featured regression analysis on such an incomplete set was not conclusive, though, indirectly suggesting in quantitative terms that cell proliferation could not fully be explained as a function of considered mechanomorphological properties (Y1-4), but in the early stage seeding, and that a randomization effects occurs over time such that the differences in initial cell proliferation performance (at day 1) is smeared over time. The findings may be the cornerstone of a novel route to accrue sufficient understanding and establish design rules for scaffold biofunctional vs. architecture, mechanical properties, and process parameters.

  20. Methodological comparison of marginal structural model, time-varying Cox regression, and propensity score methods: the example of antidepressant use and the risk of hip fracture.

    Science.gov (United States)

    Ali, M Sanni; Groenwold, Rolf H H; Belitser, Svetlana V; Souverein, Patrick C; Martín, Elisa; Gatto, Nicolle M; Huerta, Consuelo; Gardarsdottir, Helga; Roes, Kit C B; Hoes, Arno W; de Boer, Antonius; Klungel, Olaf H

    2016-03-01

    Observational studies including time-varying treatments are prone to confounding. We compared time-varying Cox regression analysis, propensity score (PS) methods, and marginal structural models (MSMs) in a study of antidepressant [selective serotonin reuptake inhibitors (SSRIs)] use and the risk of hip fracture. A cohort of patients with a first prescription for antidepressants (SSRI or tricyclic antidepressants) was extracted from the Dutch Mondriaan and Spanish Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria (BIFAP) general practice databases for the period 2001-2009. The net (total) effect of SSRI versus no SSRI on the risk of hip fracture was estimated using time-varying Cox regression, stratification and covariate adjustment using the PS, and MSM. In MSM, censoring was accounted for by inverse probability of censoring weights. The crude hazard ratio (HR) of SSRI use versus no SSRI use on hip fracture was 1.75 (95%CI: 1.12, 2.72) in Mondriaan and 2.09 (1.89, 2.32) in BIFAP. After confounding adjustment using time-varying Cox regression, stratification, and covariate adjustment using the PS, HRs increased in Mondriaan [2.59 (1.63, 4.12), 2.64 (1.63, 4.25), and 2.82 (1.63, 4.25), respectively] and decreased in BIFAP [1.56 (1.40, 1.73), 1.54 (1.39, 1.71), and 1.61 (1.45, 1.78), respectively]. MSMs with stabilized weights yielded HR 2.15 (1.30, 3.55) in Mondriaan and 1.63 (1.28, 2.07) in BIFAP when accounting for censoring and 2.13 (1.32, 3.45) in Mondriaan and 1.66 (1.30, 2.12) in BIFAP without accounting for censoring. In this empirical study, differences between the different methods to control for time-dependent confounding were small. The observed differences in treatment effect estimates between the databases are likely attributable to different confounding information in the datasets, illustrating that adequate information on (time-varying) confounding is crucial to prevent bias. Copyright © 2016 John Wiley & Sons, Ltd.

  1. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  2. STELLAR COLOR REGRESSION: A SPECTROSCOPY-BASED METHOD FOR COLOR CALIBRATION TO A FEW MILLIMAGNITUDE ACCURACY AND THE RECALIBRATION OF STRIPE 82

    International Nuclear Information System (INIS)

    Yuan, Haibo; Liu, Xiaowei; Xiang, Maosheng; Huang, Yang; Zhang, Huihua; Chen, Bingqiu

    2015-01-01

    In this paper we propose a spectroscopy-based stellar color regression (SCR) method to perform accurate color calibration for modern imaging surveys, taking advantage of millions of stellar spectra now available. The method is straightforward, insensitive to systematic errors in the spectroscopically determined stellar atmospheric parameters, applicable to regions that are effectively covered by spectroscopic surveys, and capable of delivering an accuracy of a few millimagnitudes for color calibration. As an illustration, we have applied the method to the Sloan Digital Sky Survey (SDSS) Stripe 82 data. With a total number of 23,759 spectroscopically targeted stars, we have mapped out the small but strongly correlated color zero-point errors present in the photometric catalog of Stripe 82, and we improve the color calibration by a factor of two to three. Our study also reveals some small but significant magnitude dependence errors in the z band for some charge-coupled devices (CCDs). Such errors are likely to be present in all the SDSS photometric data. Our results are compared with those from a completely independent test based on the intrinsic colors of red galaxies presented by Ivezić et al. The comparison, as well as other tests, shows that the SCR method has achieved a color calibration internally consistent at a level of about 5 mmag in u – g, 3 mmag in g – r, and 2 mmag in r – i and i – z. Given the power of the SCR method, we discuss briefly the potential benefits by applying the method to existing, ongoing, and upcoming imaging surveys

  3. Mass movement susceptibility mapping - A comparison of logistic regression and Weight of evidence methods in Taounate-Ain Aicha region (Central Rif, Morocco

    Directory of Open Access Journals (Sweden)

    JEMMAH A I

    2018-01-01

    Full Text Available Taounate region is known by a high density of mass movements which cause several human and economic losses. The goal of this paper is to assess the landslide susceptibility of Taounate using the Weight of Evidence method (WofE and the Logistic Regression method (LR. Seven conditioning factors were used in this study: lithology, fault, drainage, slope, elevation, exposure and land use. Over the years, this site and its surroundings have experienced repeated landslides. For this reason, landslide susceptibility mapping is mandatory for risk prevention and land-use management. In this study, we have focused on recent large-scale mass movements. Finally, the ROC curves were established to evaluate the degree of fit of the model and to choose the best landslide susceptibility zonation. A total mass movements location were detected; 50% were randomly selected as input data for the entire process using the Spatial Data Model (SDM and the remaining locations were used for validation purposes. The obtained WofE’s landslide susceptibility map shows that high to very high susceptibility zones contain 62% of the total of inventoried landslides, while the same zones contain only 47% of landslides in the map obtained by the LR method. This landslide susceptibility map obtained is a major contribution to various urban and regional development plans under the Taounate Region National Development Program.

  4. Fungible weights in logistic regression.

    Science.gov (United States)

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  5. Development and performance test of a new high power RF window in S-band PLS-II LINAC

    Science.gov (United States)

    Hwang, Woon-Ha; Joo, Young-Do; Kim, Seung-Hwan; Choi, Jae-Young; Noh, Sung-Ju; Ryu, Ji-Wan; Cho, Young-Ki

    2017-12-01

    A prototype of RF window was developed in collaboration with the Pohang Accelerator Laboratory (PAL) and domestic companies. High power performance tests of the single RF window were conducted at PAL to verify the operational characteristics for its application in the Pohang Light Source-II (PLS-II) linear accelerator (Linac). The tests were performed in the in-situ facility consisting of a modulator, klystron, waveguide network, vacuum system, cooling system, and RF analyzing equipment. The test results with Stanford linear accelerator energy doubler (SLED) have shown no breakdown up to 75 MW peak power with 4.5 μs RF pulse width at a repetition rate of 10 Hz. The test results with the current operation level of PLS-II Linac confirm that the RF window well satisfies the criteria for PLS-II Linac operation.

  6. Application of GA-PLS and GA-KPLS calculations for the prediction of the retention indices of essential oils

    Directory of Open Access Journals (Sweden)

    Hadi Noorizadeh

    2011-01-01

    Full Text Available Genetic algorithm and partial least square (GA-PLS and kernel PLS (GA-KPLS techniques were used to investigate the correlation between retention indices (RI and descriptors for 117 diverse compounds in essential oils from 5 Pimpinella species gathered from central Turkey which were obtained by gas chromatography and gas chromatography-mass spectrometry. The square correlation coefficient leave-group-out cross validation (LGO-CV (Q² between experimental and predicted RI for training set by GA-PLS and GA-KPLS was 0.940 and 0.963, respectively. This indicates that GA-KPLS can be used as an alternative modeling tool for quantitative structure-retention relationship (QSRR studies.

  7. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin

    2017-01-19

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  8. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun

    2017-01-01

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  9. Quantile regression theory and applications

    CERN Document Server

    Davino, Cristina; Vistocco, Domenico

    2013-01-01

    A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and

  10. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  11. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  12. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  13. DEMAND FOR AND SUPPLY OF MARK-UP AND PLS FUNDS IN ISLAMIC BANKING: SOME ALTERNATIVE EXPLANATIONS

    OpenAIRE

    KHAN, TARIQULLAH

    1995-01-01

    Profit and loss-sharing (PLS) and bai’ al murabahah lil amir bil shira (mark-up) are the two parent principles of Islamic financing. The use of PLS is limited and that of mark-up overwhelming in the operations of the Islamic banks. Several studies provide different explanations for this phenomenon. The dominant among these is the moral hazard hypothesis. Some alternative explanations are given in the present paper. The discussion is based on both demand (user of funds) and supply (bank) side ...

  14. SEPARATION PHENOMENA LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Ikaro Daniel de Carvalho Barreto

    2014-03-01

    Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.

  15. Novel, customizable scoring functions, parameterized using N-PLS, for structure-based drug discovery.

    Science.gov (United States)

    Catana, Cornel; Stouten, Pieter F W

    2007-01-01

    The ability to accurately predict biological affinity on the basis of in silico docking to a protein target remains a challenging goal in the CADD arena. Typically, "standard" scoring functions have been employed that use the calculated docking result and a set of empirical parameters to calculate a predicted binding affinity. To improve on this, we are exploring novel strategies for rapidly developing and tuning "customized" scoring functions tailored to a specific need. In the present work, three such customized scoring functions were developed using a set of 129 high-resolution protein-ligand crystal structures with measured Ki values. The functions were parametrized using N-PLS (N-way partial least squares), a multivariate technique well-known in the 3D quantitative structure-activity relationship field. A modest correlation between observed and calculated pKi values using a standard scoring function (r2 = 0.5) could be improved to 0.8 when a customized scoring function was applied. To mimic a more realistic scenario, a second scoring function was developed, not based on crystal structures but exclusively on several binding poses generated with the Flo+ docking program. Finally, a validation study was conducted by generating a third scoring function with 99 randomly selected complexes from the 129 as a training set and predicting pKi values for a test set that comprised the remaining 30 complexes. Training and test set r2 values were 0.77 and 0.78, respectively. These results indicate that, even without direct structural information, predictive customized scoring functions can be developed using N-PLS, and this approach holds significant potential as a general procedure for predicting binding affinity on the basis of in silico docking.

  16. Analysis and Modeling for China’s Electricity Demand Forecasting Using a Hybrid Method Based on Multiple Regression and Extreme Learning Machine: A View from Carbon Emission

    Directory of Open Access Journals (Sweden)

    Yi Liang

    2016-11-01

    Full Text Available The power industry is the main battlefield of CO2 emission reduction, which plays an important role in the implementation and development of the low carbon economy. The forecasting of electricity demand can provide a scientific basis for the country to formulate a power industry development strategy and further promote the sustained, healthy and rapid development of the national economy. Under the goal of low-carbon economy, medium and long term electricity demand forecasting will have very important practical significance. In this paper, a new hybrid electricity demand model framework is characterized as follows: firstly, integration of grey relation degree (GRD with induced ordered weighted harmonic averaging operator (IOWHA to propose a new weight determination method of hybrid forecasting model on basis of forecasting accuracy as induced variables is presented; secondly, utilization of the proposed weight determination method to construct the optimal hybrid forecasting model based on extreme learning machine (ELM forecasting model and multiple regression (MR model; thirdly, three scenarios in line with the level of realization of various carbon emission targets and dynamic simulation of effect of low-carbon economy on future electricity demand are discussed. The resulting findings show that, the proposed model outperformed and concentrated some monomial forecasting models, especially in boosting the overall instability dramatically. In addition, the development of a low-carbon economy will increase the demand for electricity, and have an impact on the adjustment of the electricity demand structure.

  17. Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

    International Nuclear Information System (INIS)

    Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

    2007-01-01

    Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age

  18. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  19. Long-term trends of suicide by choice of method in Norway: a joinpoint regression analysis of data from 1969 to 2012.

    Science.gov (United States)

    Puzo, Quirino; Qin, Ping; Mehlum, Lars

    2016-03-11

    Suicide mortality and the rates by specific methods in a population may change over time in response to concurrent changes in relevant factors in society. This study aimed to identify significant changing points in method-specific suicide mortality from 1969 to 2012 in Norway. Data on suicide mortality by specific methods and by sex and age were retrieved from the Norwegian Cause-of-Death Register. Long-term trends in age-standardized rates of suicide mortality were analyzed by using joinpoint regression analysis. The most frequently used suicide method in the total population was hanging, followed by poisoning and firearms. Men chose suicide by firearms more often than women, whereas poisoning and drowning were more frequently used by women. The joinpoint analysis revealed that the overall trend of suicide mortality significantly changed twice along the period of 1969 to 2012 for both sexes. The male age-standardized suicide rate increased by 3.1% per year until 1989, and decreased by 1.2% per year between 1994 and 2012. Among females the long-term suicide rate increased by 4.0% per year until 1988, decreased by 5.5% through 1995, and then stabilized. Both sexes experienced an upward trend for suicide by hanging during the 44-year observation period, with a particularly significant increase in 15-24 year old males. The most distinct change among men was seen for firearms after 1988 with a significant decrease through 2012 of around 5% per year. For women, significant reductions since 1985-88 were observed for suicide by drowning and poisoning. The present study demonstrates different time trends for different suicide methods with significant reductions in suicide by firearms, drowning and poisoning after the peak in the suicide rate in the late 1980s. Suicide by means of hanging continuously increased, but did not fully compensate for the reduced use of other methods. This lends some support for the effectiveness of method-specific suicide preventive measures

  20. Quantile Regression With Measurement Error

    KAUST Repository

    Wei, Ying; Carroll, Raymond J.

    2009-01-01

    . The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a

  1. Kernel PLS Estimation of Single-trial Event-related Potentials

    Science.gov (United States)

    Rosipal, Roman; Trejo, Leonard J.

    2004-01-01

    Nonlinear kernel partial least squaes (KPLS) regressior, is a novel smoothing approach to nonparametric regression curve fitting. We have developed a KPLS approach to the estimation of single-trial event related potentials (ERPs). For improved accuracy of estimation, we also developed a local KPLS method for situations in which there exists prior knowledge about the approximate latency of individual ERP components. To assess the utility of the KPLS approach, we compared non-local KPLS and local KPLS smoothing with other nonparametric signal processing and smoothing methods. In particular, we examined wavelet denoising, smoothing splines, and localized smoothing splines. We applied these methods to the estimation of simulated mixtures of human ERPs and ongoing electroencephalogram (EEG) activity using a dipole simulator (BESA). In this scenario we considered ongoing EEG to represent spatially and temporally correlated noise added to the ERPs. This simulation provided a reasonable but simplified model of real-world ERP measurements. For estimation of the simulated single-trial ERPs, local KPLS provided a level of accuracy that was comparable with or better than the other methods. We also applied the local KPLS method to the estimation of human ERPs recorded in an experiment on co,onitive fatigue. For these data, the local KPLS method provided a clear improvement in visualization of single-trial ERPs as well as their averages. The local KPLS method may serve as a new alternative to the estimation of single-trial ERPs and improvement of ERP averages.

  2. Magnitude And Distance Determination From The First Few Seconds Of One Three Components Seismological Station Signal Using Support Vector Machine Regression Methods

    Science.gov (United States)

    Ochoa Gutierrez, L. H.; Vargas Jimenez, C. A.; Niño Vasquez, L. F.

    2011-12-01

    The "Sabana de Bogota" (Bogota Savannah) is the most important social and economical center of Colombia. Almost the third of population is concentrated in this region and generates about the 40% of Colombia's Internal Brute Product (IBP). According to this, the zone presents an elevated vulnerability in case that a high destructive seismic event occurs. Historical evidences show that high magnitude events took place in the past with a huge damage caused to the city and indicate that is probable that such events can occur in the next years. This is the reason why we are working in an early warning generation system, using the first few seconds of a seismic signal registered by three components and wide band seismometers. Such system can be implemented using Computational Intelligence tools, designed and calibrated to the particular Geological, Structural and environmental conditions present in the region. The methods developed are expected to work on real time, thus suitable software and electronic tools need to be developed. We used Support Vector Machines Regression (SVMR) methods trained and tested with historic seismic events registered by "EL ROSAL" Station, located near Bogotá, calculating descriptors or attributes as the input of the model, from the first 6 seconds of signal. With this algorithm, we obtained less than 10% of mean absolute error and correlation coefficients greater than 85% in hypocentral distance and Magnitude estimation. With this results we consider that we can improve the method trying to have better accuracy with less signal time and that this can be a very useful model to be implemented directly in the seismological stations to generate a fast characterization of the event, broadcasting not only raw signal but pre-processed information that can be very useful for accurate Early Warning Generation.

  3. Novel liquid chromatography method based on linear weighted regression for the fast determination of isoprostane isomers in plasma samples using sensitive tandem mass spectrometry detection.

    Science.gov (United States)

    Aszyk, Justyna; Kot, Jacek; Tkachenko, Yurii; Woźniak, Michał; Bogucka-Kocka, Anna; Kot-Wasik, Agata

    2017-04-15

    A simple, fast, sensitive and accurate methodology based on a LLE followed by liquid chromatography-tandem mass spectrometry for simultaneous determination of four regioisomers (8-iso prostaglandin F 2α , 8-iso-15(R)-prostaglandin F 2α , 11β-prostaglandin F 2α , 15(R)-prostaglandin F 2α ) in routine analysis of human plasma samples was developed. Isoprostanes are stable products of arachidonic acid peroxidation and are regarded as the most reliable markers of oxidative stress in vivo. Validation of method was performed by evaluation of the key analytical parameters such as: matrix effect, analytical curve, trueness, precision, limits of detection and limits of quantification. As a homoscedasticity was not met for analytical data, weighted linear regression was applied in order to improve the accuracy at the lower end points of calibration curve. The detection limits (LODs) ranged from 1.0 to 2.1pg/mL. For plasma samples spiked with the isoprostanes at the level of 50pg/mL, intra-and interday repeatability ranged from 2.1 to 3.5% and 0.1 to 5.1%, respectively. The applicability of the proposed approach has been verified by monitoring of isoprostane isomers level in plasma samples collected from young patients (n=8) subjected to hyperbaric hyperoxia (100% oxygen at 280kPa(a) for 30min) in a multiplace hyperbaric chamber. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. 45 CFR 303.15 - Agreements to use the Federal Parent Locator Service (PLS) in parental kidnapping and child...

    Science.gov (United States)

    2010-10-01

    ... Service (PLS) in parental kidnapping and child custody or visitation cases. 303.15 Section 303.15 Public... parental kidnapping and child custody or visitation cases. (a) Definitions. The following definitions apply... responsibilities require access in connection with child custody and parental kidnapping cases; (ii) Store the...

  5. Prediction-oriented modeling in business research by means of PLS path modeling : Introduction to a JBR special section

    NARCIS (Netherlands)

    Cepeda Carrion, Gabriel; Henseler, Jörg; Ringle, Christian M.; Roldan, Jose Luis

    2016-01-01

    Under the main theme “prediction-oriented modeling in business research by means of partial least squares path modeling” (PLS), the special issue presents 17 papers. Most contributions include content from presentations at the 2nd International Symposium on Partial Least Squares Path Modeling: The

  6. Mobility of the native Bacillus subtilis conjugative plasmid pLS20 is regulated by intercellular signaling.

    Science.gov (United States)

    Singh, Praveen K; Ramachandran, Gayetri; Ramos-Ruiz, Ricardo; Peiró-Pastor, Ramón; Abia, David; Wu, Ling J; Meijer, Wilfried J J

    2013-10-01

    Horizontal gene transfer mediated by plasmid conjugation plays a significant role in the evolution of bacterial species, as well as in the dissemination of antibiotic resistance and pathogenicity determinants. Characterization of their regulation is important for gaining insights into these features. Relatively little is known about how conjugation of Gram-positive plasmids is regulated. We have characterized conjugation of the native Bacillus subtilis plasmid pLS20. Contrary to the enterococcal plasmids, conjugation of pLS20 is not activated by recipient-produced pheromones but by pLS20-encoded proteins that regulate expression of the conjugation genes. We show that conjugation is kept in the default "OFF" state and identified the master repressor responsible for this. Activation of the conjugation genes requires relief of repression, which is mediated by an anti-repressor that belongs to the Rap family of proteins. Using both RNA sequencing methodology and genetic approaches, we have determined the regulatory effects of the repressor and anti-repressor on expression of the pLS20 genes. We also show that the activity of the anti-repressor is in turn regulated by an intercellular signaling peptide. Ultimately, this peptide dictates the timing of conjugation. The implications of this regulatory mechanism and comparison with other mobile systems are discussed.

  7. Interaction of PLS and PIN and hormonal crosstalk in Arabidopsis root developmentHormonal crosstalk in Arabidopsis

    Directory of Open Access Journals (Sweden)

    Junli eLiu

    2013-04-01

    Full Text Available Understanding how hormones and genes interact to coordinate plant growth is a major challenge in developmental biology. The activities of auxin, ethylene and cytokinin depend on cellular context and exhibit either synergistic or antagonistic interactions. Here we use experimentation and network construction to elucidate the role of the interaction of the POLARIS peptide (PLS and the auxin efflux carrier PIN proteins in the crosstalk of three hormones (auxin, ethylene and cytokinin in Arabidopsis root development. In ethylene hypersignalling mutants such as polaris (pls, we show experimentally that expression of both PIN1 and PIN2 significantly increases. This relationship is analysed in the context of the crosstalk between auxin, ethylene and cytokinin: in pls, endogenous auxin, ethylene and cytokinin concentration decreases, approximately remains unchanged and increases, respectively. Experimental data are integrated into a hormonal crosstalk network through combination with information in literature. Network construction reveals that the regulation of both PIN1 and PIN2 is predominantly via ethylene signalling. In addition, it is deduced that the relationship between cytokinin and PIN1 and PIN2 levels implies a regulatory role of cytokinin in addition to its regulation to auxin, ethylene and PLS levels. We discuss how the network of hormones and genes coordinates plant growth by simultaneously regulating the activities of auxin, ethylene and cytokinin signalling pathways.

  8. Implementing the Fundamental Principle of Islamic Finance PLS in Order to Reduce Moral Hazard on the Financial Services Market

    OpenAIRE

    Dariusz Piotrowski

    2014-01-01

    Moral hazard is a situation where agent takes a risky actions, knowing that potential costs will be born by principal. In finance, moral hazard arises when advisers take risky decisions come to believe that they will not have to carry the full burden of potential loses. Implementing the fundamental principle of Islamic finance PLS could reduce moral hazard on financial services market.

  9. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  10. An Objective Screening Method for Major Depressive Disorder Using Logistic Regression Analysis of Heart Rate Variability Data Obtained in a Mental Task Paradigm

    Directory of Open Access Journals (Sweden)

    Guanghao Sun

    2016-11-01

    Full Text Available Background and Objectives: Heart rate variability (HRV has been intensively studied as a promising biological marker of major depressive disorder (MDD. Our previous study confirmed that autonomic activity and reactivity in depression revealed by HRV during rest and mental task (MT conditions can be used as diagnostic measures and in clinical evaluation. In this study, logistic regression analysis (LRA was utilized for the classification and prediction of MDD based on HRV data obtained in an MT paradigm.Methods: Power spectral analysis of HRV on R-R intervals before, during, and after an MT (random number generation was performed in 44 drug-naïve patients with MDD and 47 healthy control subjects at Department of Psychiatry in Shizuoka Saiseikai General Hospital. Logit scores of LRA determined by HRV indices and heart rates discriminated patients with MDD from healthy subjects. The high frequency (HF component of HRV and the ratio of the low frequency (LF component to the HF component (LF/HF correspond to parasympathetic and sympathovagal balance, respectively.Results: The LRA achieved a sensitivity and specificity of 80.0% and 79.0%, respectively, at an optimum cutoff logit score (0.28. Misclassifications occurred only when the logit score was close to the cutoff score. Logit scores also correlated significantly with subjective self-rating depression scale scores (p < 0.05.Conclusion: HRV indices recorded during a mental task may be an objective tool for screening patients with MDD in psychiatric practice. The proposed method appears promising for not only objective and rapid MDD screening, but also evaluation of its severity.

  11. Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

    Science.gov (United States)

    Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

    2018-03-26

    This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.

  12. Regresion PLS y PCA como Solucion al Problema de Multicolinealidad en Regresion Multiple

    Directory of Open Access Journals (Sweden)

    José Carlos Vega Vilca

    2011-03-01

    Full Text Available We present and compare principal components regression and partial least squares regression, and their solution to the problem of multicollinearity. We illustrate the use of both techniques, and demonstrate the superiority of partial least squares.

  13. L1-Penalized N-way PLS for subset of electrodes selection in BCI experiments

    Science.gov (United States)

    Eliseyev, Andrey; Moro, Cecile; Faber, Jean; Wyss, Alexander; Torres, Napoleon; Mestais, Corinne; Benabid, Alim Louis; Aksenova, Tetiana

    2012-08-01

    Recently, the N-way partial least squares (NPLS) approach was reported as an effective tool for neuronal signal decoding and brain-computer interface (BCI) system calibration. This method simultaneously analyzes data in several domains. It combines the projection of a data tensor to a low dimensional space with linear regression. In this paper the L1-Penalized NPLS is proposed for sparse BCI system calibration, allowing uniting the projection technique with an effective selection of subset of features. The L1-Penalized NPLS was applied for the binary self-paced BCI system calibration, providing selection of electrodes subset. Our BCI system is designed for animal research, in particular for research in non-human primates.

  14. [Research on modeling method to analyze Lonicerae Japonicae Flos extraction process with online MEMS-NIR based on two types of error detection theory].

    Science.gov (United States)

    Du, Chen-Zhao; Wu, Zhi-Sheng; Zhao, Na; Zhou, Zheng; Shi, Xin-Yuan; Qiao, Yan-Jiang

    2016-10-01

    To establish a rapid quantitative analysis method for online monitoring of chlorogenic acid in aqueous solution of Lonicera Japonica Flos extraction by using micro-electromechanical near infrared spectroscopy (MEMS-NIR). High performance liquid chromatography(HPLC) was used as reference method.Kennard-Stone (K-S) algorithm was used to divide sample sets, and partial least square(PLS) regression was adopted to establish the multivariate analysis model between the HPLC analysis contents and NIR spectra. The synergy interval partial least squares (SiPLS) was used to selected modeling waveband to establish PLS models. RPD was used to evaluate the prediction performance of the models. MDLs was calculated based on two types of error detection theory, on-line analytical modeling approach of Lonicera Japonica Flos extraction process was expressed scientifically by MDL. The result shows that the model established by multiplicative scatter correction(MSC) was the best, with the root mean square with cross validation(RMSECV), root mean square error of correction(RMSEC) and root mean square error of prediction(RMSEP) of chlorogenic acid as 1.707, 1.489, 2.362, respectively, the determination coefficient of the calibration model was 0.998 5, and the determination coefficient of the prediction was 0.988 1.The value of RPD is 9.468.The MDL (0.042 15 g•L⁻¹) selected by SiPLS is less than the original,which demonstrated that SiPLS was beneficial to improve the prediction performance of the model. In this study, a more accurate expression of the prediction performance of the model from the two types of error detection theory, to further illustrate MEMS-NIR spectroscopy can be used for on-line monitoring of Lonicera Japonica Flos extraction process. Copyright© by the Chinese Pharmaceutical Association.

  15. Phantom and animal imaging studies using PLS synchrotron X-rays

    CERN Document Server

    Hee Joung Kim; Kyu Ho Lee; Hai Jo Jung; Eun Kyung Kim; Jung Ho Je; In Woo Kim; Yeukuang, Hwu; Wen Li Tsai; Je Kyung Seong; Seung Won Lee; Hyung Sik Yoo

    2001-01-01

    Ultra-high resolution radiographs can be obtained using synchrotron X-rays. A collaboration team consisting of K-JIST, POSTECH and YUMC has recently commissioned a new beamline (5C1) at Pohang Light Source (PLS) in Korea for medical applications using phase contrast radiology. Relatively simple image acquisition systems were set up on 5C1 beamline, and imaging studies were performed for resolution test patterns, mammographic phantom, and animals. Resolution test patterns and mammographic phantom images showed much better image resolution and quality with the 5C1 imaging system than the mammography system. Both fish and mouse images with 5C1 imaging system also showed much better image resolution with great details of organs and anatomy compared to those obtained with a conventional mammography system. A simple and inexpensive ultra-high resolution imaging system on 5C1 beamline was successfully implemented. The authors were able to acquire ultra-high resolution images for, resolution test patterns, mammograph...

  16. Propiedades psicométricas de la Escala de lenguaje para preescolares (PLS-3 colombianos

    Directory of Open Access Journals (Sweden)

    Rita Flórez Romero

    2013-01-01

    Full Text Available Objetivo.El presente estudio buscó caracterizar las propiedades psicométricas del instrumento Preschool Language Scale – 3(PLS-3, en una muestra de 477 niños colombianos de cuatro a siete años de la ciudad de Bogotá D. C. Método. Para lograr este propósito, se realizaron análisis de coeficientes de discriminación, dificultad y matriz de relaciones tetracóricas. Resultados. Se encontraron apropiados niveles de confiabilidad, alta sensibilidad a la evolución de la comprensión y producción lingüística de los niños, así como un bajo índice de dificultad en algunos reactivos. Conclusión. Estos resultados se discuten a la luz de índices de discriminación en pruebas de desarrollo lingüístico típico y atípico.

  17. Comparison of stochastic and regression based methods for quantification of predictive uncertainty of model-simulated wellhead protection zones in heterogeneous aquifers

    DEFF Research Database (Denmark)

    Christensen, Steen; Moore, C.; Doherty, J.

    2006-01-01

    accurate and required a few hundred model calls to be computed. (b) The linearized regression-based interval (Cooley, 2004) required just over a hundred model calls and also appeared to be nearly correct. (c) The calibration-constrained Monte-Carlo interval (Doherty, 2003) was found to be narrower than......For a synthetic case we computed three types of individual prediction intervals for the location of the aquifer entry point of a particle that moves through a heterogeneous aquifer and ends up in a pumping well. (a) The nonlinear regression-based interval (Cooley, 2004) was found to be nearly...... the regression-based intervals but required about half a million model calls. It is unclear whether or not this type of prediction interval is accurate....

  18. Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

    Science.gov (United States)

    Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

    2017-05-01

    Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.

  19. Gaussian process regression analysis for functional data

    CERN Document Server

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  20. On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: an application of flexible sampling methods using neural networks

    NARCIS (Netherlands)

    Hoogerheide, L.F.; Kaashoek, J.F.; van Dijk, H.K.

    2007-01-01

    Likelihoods and posteriors of instrumental variable (IV) regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating posterior

  1. On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: an application of flexible sampling methods using neural networks

    NARCIS (Netherlands)

    L.F. Hoogerheide (Lennart); J.F. Kaashoek (Johan); H.K. van Dijk (Herman)

    2005-01-01

    textabstractLikelihoods and posteriors of instrumental variable regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating such contours

  2. Non-destructive geographical traceability of sea cucumber (Apostichopus japonicus) using near infrared spectroscopy combined with chemometric methods.

    Science.gov (United States)

    Guo, Xiuhan; Cai, Rui; Wang, Shisheng; Tang, Bo; Li, Yueqing; Zhao, Weijie

    2018-01-01

    Sea cucumber is the major tonic seafood worldwide, and geographical origin traceability is an important part of its quality and safety control. In this work, a non-destructive method for origin traceability of sea cucumber ( Apostichopus japonicus ) from northern China Sea and East China Sea using near infrared spectroscopy (NIRS) and multivariate analysis methods was proposed. Total fat contents of 189 fresh sea cucumber samples were determined and partial least-squares (PLS) regression was used to establish the quantitative NIRS model. The ordered predictor selection algorithm was performed to select feasible wavelength regions for the construction of PLS and identification models. The identification model was developed by principal component analysis combined with Mahalanobis distance and scaling to the first range algorithms. In the test set of the optimum PLS models, the root mean square error of prediction was 0.45, and correlation coefficient was 0.90. The correct classification rates of 100% were obtained in both identification calibration model and test model. The overall results indicated that NIRS method combined with chemometric analysis was a suitable tool for origin traceability and identification of fresh sea cucumber samples from nine origins in China.

  3. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  4. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  5. Simultaneous quantitative determination of paracetamol and tramadol in tablet formulation using UV spectrophotometry and chemometric methods

    Science.gov (United States)

    Glavanović, Siniša; Glavanović, Marija; Tomišić, Vladislav

    2016-03-01

    The UV spectrophotometric methods for simultaneous quantitative determination of paracetamol and tramadol in paracetamol-tramadol tablets were developed. The spectrophotometric data obtained were processed by means of partial least squares (PLS) and genetic algorithm coupled with PLS (GA-PLS) methods in order to determine the content of active substances in the tablets. The results gained by chemometric processing of the spectroscopic data were statistically compared with those obtained by means of validated ultra-high performance liquid chromatographic (UHPLC) method. The accuracy and precision of data obtained by the developed chemometric models were verified by analysing the synthetic mixture of drugs, and by calculating recovery as well as relative standard error (RSE). A statistically good agreement was found between the amounts of paracetamol determined using PLS and GA-PLS algorithms, and that obtained by UHPLC analysis, whereas for tramadol GA-PLS results were proven to be more reliable compared to those of PLS. The simplest and the most accurate and precise models were constructed by using the PLS method for paracetamol (mean recovery 99.5%, RSE 0.89%) and the GA-PLS method for tramadol (mean recovery 99.4%, RSE 1.69%).

  6. Variables e Índices de Competitividad de las Empresas Exportadoras, utilizando el PLS

    Directory of Open Access Journals (Sweden)

    Joel Bonales Valencia

    2015-12-01

    Full Text Available El ambiente competitivo entre las empresas exportadoras ha generado la necesidad de crear estrategias competitivas para seguir teniendo presencia en los mercados internacionales. De esto se desglosa, que las compañías requieren mejor entendimiento sobre la competitividad, los factores que la determinan, y los índices que la miden. Sin embargo, los factores e índices de competitividad por sí solos no proveen suficiente información útil para los directivos, por lo que es necesario validar un modelo que interrelacione los factores e índicesde la competitividad de las empresas exportadoras. En este artículo se muestra una revisión de las variables importantes para comprender la competitividad y se extraen del marco teórico los factores que determinan la misma para el caso de las empresas exportadoras y los índices que la miden. A partir de las variables (calidad, precio, capacitación, tecnologíay canales de distribución se desarrolló una clasificación estructurada de factores e índices y se aplicó una encuesta a los directivos de veinticinco empresas exportadoras del estado de Michoacán. A partir de los factores e índices relevantes encontrados, se propone un modelo que describe cómo esas variables están interrelacionadas, basándose en la técnica estadística de modelación de Mínimos Cuadrados Parciales (Partial Least Squares, en inglés conocido también como PLS. Los resultados finales mostraron que la variable tecnología tiene el mayor impacto sobre los índices de competitividad, lo que implica que las empresas exportadoras deben poner especial atención en esta variable.

  7. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Ryan B., E-mail: randerson@astro.cornell.edu [Cornell University Department of Astronomy, 406 Space Sciences Building, Ithaca, NY 14853 (United States); Bell, James F., E-mail: Jim.Bell@asu.edu [Arizona State University School of Earth and Space Exploration, Bldg.: INTDS-A, Room: 115B, Box 871404, Tempe, AZ 85287 (United States); Wiens, Roger C., E-mail: rwiens@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States); Morris, Richard V., E-mail: richard.v.morris@nasa.gov [NASA Johnson Space Center, 2101 NASA Parkway, Houston, TX 77058 (United States); Clegg, Samuel M., E-mail: sclegg@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States)

    2012-04-15

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO{sub 2} at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by {approx} 3 wt.%. The statistical significance of these improvements was {approx} 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and

  8. Preliminary antifungal and cytotoxic evaluation of synthetic cycloalkyl[b]thiophene derivatives with PLS-DA analysis.

    Science.gov (United States)

    Souza, Beatriz C C; De Oliveira, Tiago B; Aquino, Thiago M; de Lima, Maria C A; Pitta, Ivan R; Galdino, Suely L; Lima, Edeltrudes O; Gonçalves-Silva, Teresinha; Militão, Gardênia C G; Scotti, Luciana; Scotti, Marcus T; Mendonça, Francisco J B

    2012-06-01

    A series of 2-[(arylidene)amino]-cycloalkyl[b]thiophene-3-carbonitriles (2a-x) was synthesized by incorporation of substituted aromatic aldehydes in Gewald adducts (1a-c). The title compounds were screened for their antifungal activity against Candida krusei and Criptococcus neoformans and for their antiproliferative activity against a panel of 3 human cancer cell lines (HT29, NCI H-292 and HEP). For antiproliferative activity, the partial least squares (PLS) methodology was applied. Some of the prepared compounds exhibited promising antifungal and proliferative properties. The most active compounds for antifungal activity were cyclohexyl[b]thiophene derivatives, and for antiproliferative activity cycloheptyl[b]thiophene derivatives, especially 2-[(1H-indol-2-yl-methylidene)amino]- 5,6,7,8-tetrahydro-4H-cyclohepta[b]thiophene-3-carbonitrile (2r), which inhibited more than 97 % growth of the three cell lines. The PLS discriminant analysis (PLS-DA) applied generated good exploratory and predictive results and showed that the descriptors having shape characteristics were strongly correlated with the biological data.

  9. Prediction of long-residue properties of potential blends from mathematically mixed infrared spectra of pure crude oils by partial least-squares regression models

    NARCIS (Netherlands)

    de Peinder, P.; Visser, T.; Petrauskas, D.D.; Salvatori, F.; Soulimani, F.; Weckhuysen, B.M.

    2009-01-01

    Research has been carried out to determine the feasibility of partial least-squares (PLS) regression models to predict the long-residue (LR) properties of potential blends from infrared (IR) spectra that have been created by linearly co-adding the IR spectra of crude oils. The study is the follow-up

  10. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    Science.gov (United States)

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  11. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method

    Directory of Open Access Journals (Sweden)

    Ying Peng

    2018-03-01

    Full Text Available Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  12. Principal component regression analysis with SPSS.

    Science.gov (United States)

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  13. Regression Analysis by Example. 5th Edition

    Science.gov (United States)

    Chatterjee, Samprit; Hadi, Ali S.

    2012-01-01

    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  14. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  15. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2013-01-01

    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  16. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  17. Survival analysis II: Cox regression

    NARCIS (Netherlands)

    Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.

    2011-01-01

    In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the

  18. Relationship of oestrus synchronization method, circulating hormones, luteinizing hormone and prostaglandin F-2 alpha receptors and luteal progesterone concentration to premature luteal regression in superovulated sheep.

    Science.gov (United States)

    Schiewe, M C; Fitz, T A; Brown, J L; Stuart, L D; Wildt, D E

    1991-09-01

    Ewes were treated with exogenous follicle-stimulating hormone (FSH) and oestrus was synchronized using either a dual prostaglandin F-2 alpha (PGF-2 alpha) injection regimen or pessaries impregnated with medroxy progesterone acetate (MAP). Natural cycling ewes served as controls. After oestrus or AI (Day 0), corpora lutea (CL) were enucleated surgically from the left and right ovaries on Days 3 and 6, respectively. The incidence of premature luteolysis was related (P less than 0.05) to PGF-2 alpha treatment and occurred in 7 of 8 ewes compared with 0 of 4 controls and 1 of 8 MAP-exposed females. Sheep with regressing CL had lower circulating and intraluteal progesterone concentrations and fewer total and small dissociated luteal cells on Day 3 than gonadotrophin-treated counterparts with normal CL. Progesterone concentration in the serum and luteal tissue was higher (P less than 0.05) in gonadotrophin-treated ewes with normal CL than in the controls; but luteinizing hormone (LH) receptors/cell were not different on Days 3 and 6. There were no apparent differences in the temporal patterns of circulating oestradiol-17 beta, FSH and LH. High progesterone in gonadotrophin-treated ewes with normal CL coincided with an increase in total luteal mass and numbers of cells, which were primarily reflected in more small luteal cells than in control ewes. Gonadotrophin-treated ewes with regressing CL on Day 3 tended (P less than 0.10) to have fewer small luteal cells and fewer (P less than 0.05) low-affinity PGF-2 alpha binding sites than sheep with normal CL. By Day 6, luteal integrity and cell viability was absent in ewes with prematurely regressed CL. These data demonstrate that (i) the incidence of premature luteal regression is highly correlated with the use of PGF-2 alpha; (ii) this abnormal luteal tissue is functionally competent for 2-3 days after ovulation, but deteriorates rapidly thereafter and (iii) luteal-dysfunctioning ewes experience a reduction in numbers of

  19. Correlation and simple linear regression.

    Science.gov (United States)

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  20. Regression filter for signal resolution

    International Nuclear Information System (INIS)

    Matthes, W.

    1975-01-01

    The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)

  1. Logistic regression for dichotomized counts.

    Science.gov (United States)

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  2. Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method for the parameter estimation on geographically weighted ordinal logistic regression model (GWOLR)

    Science.gov (United States)

    Saputro, Dewi Retno Sari; Widyaningsih, Purnami

    2017-08-01

    In general, the parameter estimation of GWOLR model uses maximum likelihood method, but it constructs a system of nonlinear equations, making it difficult to find the solution. Therefore, an approximate solution is needed. There are two popular numerical methods: the methods of Newton and Quasi-Newton (QN). Newton's method requires large-scale time in executing the computation program since it contains Jacobian matrix (derivative). QN method overcomes the drawback of Newton's method by substituting derivative computation into a function of direct computation. The QN method uses Hessian matrix approach which contains Davidon-Fletcher-Powell (DFP) formula. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is categorized as the QN method which has the DFP formula attribute of having positive definite Hessian matrix. The BFGS method requires large memory in executing the program so another algorithm to decrease memory usage is needed, namely Low Memory BFGS (LBFGS). The purpose of this research is to compute the efficiency of the LBFGS method in the iterative and recursive computation of Hessian matrix and its inverse for the GWOLR parameter estimation. In reference to the research findings, we found out that the BFGS and LBFGS methods have arithmetic operation schemes, including O(n2) and O(nm).

  3. Quantile Regression With Measurement Error

    KAUST Repository

    Wei, Ying

    2009-08-27

    Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.

  4. [Establishment of the Mathematical Model for PMI Estimation Using FTIR Spectroscopy and Data Mining Method].

    Science.gov (United States)

    Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H

    2018-02-01

    To analyse the relationship between Fourier transform infrared (FTIR) spectrum of rat's spleen tissue and postmortem interval (PMI) for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis (PCA) showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis (PLS-DA) and support vector machine classification (SVMC) effectively divided the spectrum samples with different PMI into four categories (0-24 h, 48-72 h, 96-120 h and 144-168 h). The determination coefficient ( R ²) of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration (RMSEC) and root mean square error of cross validation (RMSECV) were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction (RMSEP) was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.

  5. Multicollinearity and Regression Analysis

    Science.gov (United States)

    Daoud, Jamal I.

    2017-12-01

    In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.

  6. A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure.

    Science.gov (United States)

    Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C

    2014-12-01

    It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.

  7. Minimax Regression Quantiles

    DEFF Research Database (Denmark)

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  8. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....

  9. Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA

    Science.gov (United States)

    Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz

    2018-04-01

    External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.

  10. Regression with Sparse Approximations of Data

    DEFF Research Database (Denmark)

    Noorzad, Pardis; Sturm, Bob L.

    2012-01-01

    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

  11. Type I error rates of rare single nucleotide variants are inflated in tests of association with non-normally distributed traits using simple linear regression methods.

    Science.gov (United States)

    Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F

    2016-01-01

    In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log 10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.

  12. Application of the Support Vector Regression Method for Turbidity Assessment with MODIS on a Shallow Coral Reef Lagoon (Voh-Koné-Pouembout, New Caledonia

    Directory of Open Access Journals (Sweden)

    Guillaume Wattelez

    2017-09-01

    Full Text Available Particle transport by erosion from ultramafic lands in pristine tropical lagoons is a crucial problem, especially for the benthic and pelagic biodiversity associated with coral reefs. Satellite imagery is useful for assessing particle transport from land to sea. However, in the oligotrophic and shallow waters of tropical lagoons, the bottom reflection of downwelling light usually hampers the use of classical optical algorithms. In order to address this issue, a Support Vector Regression (SVR model was developed and tested. The proposed application concerns the lagoon of New Caledonia—the second longest continuous coral reef in the world—which is frequently exposed to river plumes from ultramafic watersheds. The SVR model is based on a large training sample of in-situ turbidity values representative of the annual variability in the Voh-Koné-Pouembout lagoon (Western Coast of New Caledonia during the 2014–2015 period and on coincident satellite reflectance values from MODerate Resolution Imaging Spectroradiometer (MODIS. It was trained with reflectance and two other explanatory parameters—bathymetry and bottom colour. This approach significantly improved the model’s capacity for retrieving the in-situ turbidity range from MODIS images, as compared with algorithms dedicated to deep oligotrophic or turbid waters, which were shown to be inadequate. This SVR model is applicable to the whole shallow lagoon waters from the Western Coast of New Caledonia and it is now ready to be tested over other oligotrophic shallow lagoon waters worldwide.

  13. Segmentation and profiling consumers in a multi-channel environment using a combination of self-organizing maps (SOM method, and logistic regression

    Directory of Open Access Journals (Sweden)

    Seyed Ali Akbar Afjeh

    2014-05-01

    Full Text Available Market segmentation plays essential role on understanding the behavior of people’s interests in purchasing various products and services through various channels. This paper presents an empirical investigation to shed light on consumer’s purchasing attitude as well as gathering information in multi-channel environment. The proposed study of this paper designed a questionnaire and distributed it among 800 people who were at least 18 years of age and had some experiences on purchasing goods and services on internet, catalog or regular shopping centers. Self-organizing map, SOM, clustering technique was performed based on consumer’s interest in gathering information as well as purchasing products through internet, catalog and shopping centers and determined four segments. There were two types of questions for the proposed study of this paper. The first group considered participants’ personal characteristics such as age, gender, income, etc. The second group of questions was associated with participants’ psychographic characteristics including price consciousness, quality consciousness, time pressure, etc. Using multinominal logistic regression technique, the study determines consumers’ behaviors in each four segments.

  14. [Screen potential CYP450 2E1 inhibitors from Chinese herbal medicine based on support vector regression and molecular docking method].

    Science.gov (United States)

    Chen, Xi; Lu, Fang; Jiang, Lu-di; Cai, Yi-Lian; Li, Gong-Yu; Zhang, Yan-Ling

    2016-07-01

    Inhibition of cytochrome P450 (CYP450) enzymes is the most common reasons for drug interactions, so the study on early prediction of CYPs inhibitors can help to decrease the incidence of adverse reactions caused by drug interactions.CYP450 2E1(CYP2E1), as a key role in drug metabolism process, has broad spectrum of drug metabolism substrate. In this study, 32 CYP2E1 inhibitors were collected for the construction of support vector regression (SVR) model. The test set data were used to verify CYP2E1 quantitative models and obtain the optimal prediction model of CYP2E1 inhibitor. Meanwhile, one molecular docking program, CDOCKER, was utilized to analyze the interaction pattern between positive compounds and active pocket to establish the optimal screening model of CYP2E1 inhibitors.SVR model and molecular docking prediction model were combined to screen traditional Chinese medicine database (TCMD), which could improve the calculation efficiency and prediction accuracy. 6 376 traditional Chinese medicine (TCM) compounds predicted by SVR model were obtained, and in further verification by using molecular docking model, 247 TCM compounds with potential inhibitory activities against CYP2E1 were finally retained. Some of them have been verified by experiments. The results demonstrated that this study could provide guidance for the virtual screening of CYP450 inhibitors and the prediction of CYPs-mediated DDIs, and also provide references for clinical rational drug use. Copyright© by the Chinese Pharmaceutical Association.

  15. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  16. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  17. Nonlinear Regression with R

    CERN Document Server

    Ritz, Christian; Parmigiani, Giovanni

    2009-01-01

    R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.

  18. Bayesian ARTMAP for regression.

    Science.gov (United States)

    Sasu, L M; Andonie, R

    2013-10-01

    Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Bounded Gaussian process regression

    DEFF Research Database (Denmark)

    Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan

    2013-01-01

    We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....

  20. and Multinomial Logistic Regression

    African Journals Online (AJOL)

    This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).

  1. Mechanisms of neuroblastoma regression

    Science.gov (United States)

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  2. Propensity-score matching in economic analyses: comparison with regression models, instrumental variables, residual inclusion, differences-in-differences, and decomposition methods.

    Science.gov (United States)

    Crown, William H

    2014-02-01

    This paper examines the use of propensity score matching in economic analyses of observational data. Several excellent papers have previously reviewed practical aspects of propensity score estimation and other aspects of the propensity score literature. The purpose of this paper is to compare the conceptual foundation of propensity score models with alternative estimators of treatment effects. References are provided to empirical comparisons among methods that have appeared in the literature. These comparisons are available for a subset of the methods considered in this paper. However, in some cases, no pairwise comparisons of particular methods are yet available, and there are no examples of comparisons across all of the methods surveyed here. Irrespective of the availability of empirical comparisons, the goal of this paper is to provide some intuition about the relative merits of alternative estimators in health economic evaluations where nonlinearity, sample size, availability of pre/post data, heterogeneity, and missing variables can have important implications for choice of methodology. Also considered is the potential combination of propensity score matching with alternative methods such as differences-in-differences and decomposition methods that have not yet appeared in the empirical literature.

  3. Determination of Ethanol in Blood Samples Using Partial Least Square Regression Applied to Surface Enhanced Raman Spectroscopy.

    Science.gov (United States)

    Açikgöz, Güneş; Hamamci, Berna; Yildiz, Abdulkadir

    2018-04-01

    Alcohol consumption triggers toxic effect to organs and tissues in the human body. The risks are essentially thought to be related to ethanol content in alcoholic beverages. The identification of ethanol in blood samples requires rapid, minimal sample handling, and non-destructive analysis, such as Raman Spectroscopy. This study aims to apply Raman Spectroscopy for identification of ethanol in blood samples. Silver nanoparticles were synthesized to obtain Surface Enhanced Raman Spectroscopy (SERS) spectra of blood samples. The SERS spectra were used for Partial Least Square (PLS) for determining ethanol quantitatively. To apply PLS method, 920~820 cm -1 band interval was chosen and the spectral changes of the observed concentrations statistically associated with each other. The blood samples were examined according to this model and the quantity of ethanol was determined as that: first a calibration method was established. A strong relationship was observed between known concentration values and the values obtained by PLS method (R 2 = 1). Second instead of then, quantities of ethanol in 40 blood samples were predicted according to the calibration method. Quantitative analysis of the ethanol in the blood was done by analyzing the data obtained by Raman spectroscopy and the PLS method.

  4. Segregation of a QTL cluster for home-cage activity using a new mapping method based on regression analysis of congenic mouse strains

    Science.gov (United States)

    Kato, S; Ishii, A; Nishi, A; Kuriki, S; Koide, T

    2014-01-01

    Recent genetic studies have shown that genetic loci with significant effects in whole-genome quantitative trait loci (QTL) analyses were lost or weakened in congenic strains. Characterisation of the genetic basis of this attenuated QTL effect is important to our understanding of the genetic mechanisms of complex traits. We previously found that a consomic strain, B6-Chr6CMSM, which carries chromosome 6 of a wild-derived strain MSM/Ms on the genetic background of C57BL/6J, exhibited lower home-cage activity than C57BL/6J. In the present study, we conducted a composite interval QTL analysis using the F2 mice derived from a cross between C57BL/6J and B6-Chr6CMSM. We found one QTL peak that spans 17.6 Mbp of chromosome 6. A subconsomic strain that covers the entire QTL region also showed lower home-cage activity at the same level as the consomic strain. We developed 15 congenic strains, each of which carries a shorter MSM/Ms-derived chromosomal segment from the subconsomic strain. Given that the results of home-cage activity tests on the congenic strains cannot be explained by a simple single-gene model, we applied regression analysis to segregate the multiple genetic loci. The results revealed three loci (loci 1–3) that have the effect of reducing home-cage activity and one locus (locus 4) that increases activity. We also found that the combination of loci 3 and 4 cancels out the effects of the congenic strains, which indicates the existence of a genetic mechanism related to the loss of QTLs. PMID:24781804

  5. Optimizing methods for linking cinematic features to fMRI data.

    Science.gov (United States)

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved

  6. Rapid classification of pharmaceutical ingredients with Raman spectroscopy using compressive detection strategy with PLS-DA multivariate filters.

    Science.gov (United States)

    Cebeci Maltaş, Derya; Kwok, Kaho; Wang, Ping; Taylor, Lynne S; Ben-Amotz, Dor

    2013-06-01

    Identifying pharmaceutical ingredients is a routine procedure required during industrial manufacturing. Here we show that a recently developed Raman compressive detection strategy can be employed to classify various widely used pharmaceutical materials using a hybrid supervised/unsupervised strategy in which only two ingredients are used for training and yet six other ingredients can also be distinguished. More specifically, our liquid crystal spatial light modulator (LC-SLM) based compressive detection instrument is trained using only the active ingredient, tadalafil, and the excipient, lactose, but is tested using these and various other excipients; microcrystalline cellulose, magnesium stearate, titanium (IV) oxide, talc, sodium lauryl sulfate and hydroxypropyl cellulose. Partial least squares discriminant analysis (PLS-DA) is used to generate the compressive detection filters necessary for fast chemical classification. Although the filters used in this study are trained on only lactose and tadalafil, we show that all the pharmaceutical ingredients mentioned above can be differentiated and classified using PLS-DA compressive detection filters with an accumulation time of 10ms per filter. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Use of Multiple Linear Regression Method for Modelling Seasonal Changes in Stable Isotopes of 18O and 2H in 30 Pouns in Gilan Province

    Directory of Open Access Journals (Sweden)

    M.A. Mousavi Shalmani

    2014-08-01

    Full Text Available In order to assessment of water quality and characterize seasonal variation in 18O and 2H in relation with different chemical and physiographical parameters and modelling of effective parameters, an study was conducted during 2010 to 2011 in 30 different ponds in the north of Iran. Samples were collected at three different seasons and analysed for chemical and isotopic components. Data shows that highest amounts of δ18O and δ2H were recorded in the summer (-1.15‰ and -12.11‰ and the lowest amounts were seen in the winter (-7.50‰ and -47.32‰ respectively. Data also reveals that there is significant increase in d-excess during spring and summer in ponds 20, 21, 22, 24, 25 and 26. We can conclude that residual surface runoff (from upper lands is an important source of water to transfer soluble salts in to these ponds. In this respect, high retention time may be the main reason for movements of light isotopes in to the ponds. This has led d-excess of pond 12 even greater in summer than winter. This could be an acceptable reason for ponds 25 and 26 (Siyahkal county with highest amount of d-excess and lowest amounts of δ18O and δ2H. It seems light water pumped from groundwater wells with minor source of salt (originated from sea deep percolation in to the ponds, could may be another reason for significant decrease in the heavy isotopes of water (18O and 2H for ponds 2, 12, 14 and 25 from spring to summer. Overall conclusion of multiple linear regression test indicate that firstly from 30 variables (under investigation only a few cases can be used for identifying of changes in 18O and 2H by applications. Secondly, among the variables (studied, phytoplankton content was a common factor for interpretation of 18O and 2H during spring and summer, and also total period (during a year. Thirdly, the use of water in the spring was recommended for sampling, for 18O and 2H interpretation compared with other seasons. This is because of function can be

  8. Ridge Regression Signal Processing

    Science.gov (United States)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  9. A primer of statistical methods for correlating parameters and properties of electrospun poly( l -lactide) scaffolds for tissue engineering-PART 2: Regression

    KAUST Repository

    Seyedmahmoud, Rasoul; Mozetic, Pamela; Rainer, Alberto; Giannitelli, Sara Maria; Basoli, Francesco; Trombetta, Marcella; Traversa, Enrico; Licoccia, Silvia; Rinaldi, Antonio

    2014-01-01

    and interactions between a handful of key scaffold properties (Ys) and a set of process parameters (Xs) in electrospinning. While Part-1 dealt with the DOE methods to unveil the interactions between Xs in determining the morphomechanical properties (ref. Y1

  10. Better Autologistic Regression

    Directory of Open Access Journals (Sweden)

    Mark A. Wolters

    2017-11-01

    Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.

  11. Regression in organizational leadership.

    Science.gov (United States)

    Kernberg, O F

    1979-02-01

    The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.

  12. Check-all-that-apply data analysed by Partial Least Squares regression

    DEFF Research Database (Denmark)

    Rinnan, Åsmund; Giacalone, Davide; Frøst, Michael Bom

    2015-01-01

    are analysed by multivariate techniques. CATA data can be analysed both by setting the CATA as the X and the Y. The former is the PLS-Discriminant Analysis (PLS-DA) version, while the latter is the ANOVA-PLS (A-PLS) version. We investigated the difference between these two approaches, concluding...

  13. Simultaneous Determination of 6-Mercaptopurine and its Oxidative Metabolites in Synthetic Solutions and Human Plasma using Spectrophotometric Multivariate Calibration Methods

    Directory of Open Access Journals (Sweden)

    Mohammad-Reza Rashidi

    2011-06-01

    Full Text Available Introduction: 6-Mercaptopurine (6MP is an important chemotherapeutic drug in the conventional treatment of childhood acute lymphoblastic leukemia (ALL. It is catabolized to 6-thiouric acid (6TUA through 8-hydroxo-6-mercaptopurine (8OH6MP or 6-thioxanthine (6TX intermediates. Methods: High-performance liquid chromatography (HPLC is usually used to determine the contents of therapeutic drugs, metabolites and other important biomedical analytes in biological samples. In the present study, the multivariate calibration methods, partial least squares (PLS-1 and principle component regression (PCR have been developed and validated for the simultaneous determination of 6MP and its oxidative metabolites (6TUA, 8OH6MP and 6TX without analyte separation in spiked human plasma. Mixtures of 6MP, 8-8OH6MP, 6TX and 6TUA have been resolved by PLS-1 and PCR to their UV spectra. Results: Recoveries (% obtained for 6MP, 8-8OH6MP, 6TX and 6TUA were 94.5-97.5, 96.6-103.3, 95.1-96.9 and 93.4-95.8, respectively, using PLS-1 and 96.7-101.3, 96.2-98.8, 95.8-103.3 and 94.3-106.1, respectively, using PCR. The NAS (Net analyte signal concept was used to calculate multivariate analytical figures of merit such as limit of detection (LOD, selectivity and sensitivity. The limit of detections for 6MP, 8-8OH6MP, 6TX and 6TUA were calculated to be 0.734, 0.439, 0.797 and 0.482 µmol L-1, respectively, using PLS and 0.724, 0.418, 0783 and 0.535 µmol L-1, respectively, using PCR. HPLC was also applied as a validation method for simultaneous determination of these thiopurines in the synthetic solutions and human plasma. Conclusion: Combination of spectroscopic techniques and chemometric methods (PLS and PCR has provided a simple but powerful method for simultaneous analysis of multicomponent mixtures.

  14. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  15. Discriminative Analysis of Different Grades of Gaharu (Aquilaria malaccensis Lamk. via 1H-NMR-Based Metabolomics Using PLS-DA and Random Forests Classification Models

    Directory of Open Access Journals (Sweden)

    Siti Nazirah Ismail

    2017-09-01

    Full Text Available Gaharu (agarwood, Aquilaria malaccensis Lamk. is a valuable tropical rainforest product traded internationally for its distinctive fragrance. It is not only popular as incense and in perfumery, but also favored in traditional medicine due to its sedative, carminative, cardioprotective and analgesic effects. The current study addresses the chemical differences and similarities between gaharu samples of different grades, obtained commercially, using 1H-NMR-based metabolomics. Two classification models: partial least squares-discriminant analysis (PLS-DA and Random Forests were developed to classify the gaharu samples on the basis of their chemical constituents. The gaharu samples could be reclassified into a ‘high grade’ group (samples A, B and D, characterized by high contents of kusunol, jinkohol, and 10-epi-γ-eudesmol; an ‘intermediate grade’ group (samples C, F and G, dominated by fatty acid and vanillic acid; and a ‘low grade’ group (sample E and H, which had higher contents of aquilarone derivatives and phenylethyl chromones. The results showed that 1H- NMR-based metabolomics can be a potential method to grade the quality of gaharu samples on the basis of their chemical constituents.

  16. Regression models of reactor diagnostic signals

    International Nuclear Information System (INIS)

    Vavrin, J.

    1989-01-01

    The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)

  17. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...

  18. Adaptive metric kernel regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    2000-01-01

    Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  19. Adaptive Metric Kernel Regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    1998-01-01

    Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  20. Regression of environmental noise in LIGO data

    International Nuclear Information System (INIS)

    Tiwari, V; Klimenko, S; Mitselmakher, G; Necula, V; Drago, M; Prodi, G; Frolov, V; Yakushin, I; Re, V; Salemi, F; Vedovato, G

    2015-01-01

    We address the problem of noise regression in the output of gravitational-wave (GW) interferometers, using data from the physical environmental monitors (PEM). The objective of the regression analysis is to predict environmental noise in the GW channel from the PEM measurements. One of the most promising regression methods is based on the construction of Wiener–Kolmogorov (WK) filters. Using this method, the seismic noise cancellation from the LIGO GW channel has already been performed. In the presented approach the WK method has been extended, incorporating banks of Wiener filters in the time–frequency domain, multi-channel analysis and regulation schemes, which greatly enhance the versatility of the regression analysis. Also we present the first results on regression of the bi-coherent noise in the LIGO data. (paper)