WorldWideScience

Sample records for chemometric variable selection

  1. Effect of genetic algorithm as a variable selection method on different chemometric models applied for the analysis of binary mixture of amoxicillin and flucloxacillin: A comparative study

    Science.gov (United States)

    Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

    2016-03-01

    Different chemometric models were applied for the quantitative analysis of amoxicillin (AMX), and flucloxacillin (FLX) in their binary mixtures, namely, partial least squares (PLS), spectral residual augmented classical least squares (SRACLS), concentration residual augmented classical least squares (CRACLS) and artificial neural networks (ANNs). All methods were applied with and without variable selection procedure (genetic algorithm GA). The methods were used for the quantitative analysis of the drugs in laboratory prepared mixtures and real market sample via handling the UV spectral data. Robust and simpler models were obtained by applying GA. The proposed methods were found to be rapid, simple and required no preliminary separation steps.

  2. Non-targeted detection of chemical contamination in carbonated soft drinks using NMR spectroscopy, variable selection and chemometrics

    Energy Technology Data Exchange (ETDEWEB)

    Charlton, Adrian J. [Department for Environment, Food and Rural Affairs, Central Science Laboratory, Sand Hutton, York YO41 1LZ (United Kingdom)], E-mail: adrian.charlton@csl.gov.uk; Robb, Paul; Donarski, James A.; Godward, John [Department for Environment, Food and Rural Affairs, Central Science Laboratory, Sand Hutton, York YO41 1LZ (United Kingdom)

    2008-06-23

    An efficient method for detecting malicious and accidental contamination of foods has been developed using a combined {sup 1}H nuclear magnetic resonance (NMR) and chemometrics approach. The method has been demonstrated using a commercially available carbonated soft drink, as being capable of identifying atypical products and to identify contaminant resonances. Soft-independent modelling of class analogy (SIMCA) was used to compare {sup 1}H NMR profiles of genuine products (obtained from the manufacturer) against retail products spiked in the laboratory with impurities. The benefits of using feature selection for extracting contaminant NMR frequencies were also assessed. Using example impurities (paraquat, p-cresol and glyphosate) NMR spectra were analysed using multivariate methods resulting in detection limits of approximately 0.075, 0.2, and 0.06 mM for p-cresol, paraquat and glyphosate, respectively. These detection limits are shown to be approximately 100-fold lower than the minimum lethal dose for paraquat. The methodology presented here is used to assess the composition of complex matrices for the presence of contaminating molecules without a priori knowledge of the nature of potential contaminants. The ability to detect if a sample does not fit into the expected profile without recourse to multiple targeted analyses is a valuable tool for incident detection and forensic applications.

  3. Variable and subset selection in PLS regression

    DEFF Research Database (Denmark)

    Høskuldsson, Agnar

    2001-01-01

    The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...

  4. Selection of quantum chemical descriptors by chemometric methods in the study of antioxidant activity of flavonoid compounds

    Science.gov (United States)

    Weber, K. C.; Honório, K. M.; da Silva, S. L.; Mercadante, R.; da Silva, A. B. F.

    In the present study, the aim was to select electronic properties responsible for free radical scavenging ability of a set of 25 flavonoid compounds employing chemometric methods. Electronic parameters were calculated using the AM1 semiempirical method, and chemometric methods (principal component analysis, hierarchical cluster analysis, and k-nearest neighbor) were used with the aim to build models able to find relationships between electronic features and the antioxidant activity presented by the compounds studied. According to these models, four electronic variables can be considered important to discriminate more and less antioxidant flavonoid compounds: polarizability (α), charge at carbon 3 (QC3), total charge at substituent 5 (QS5), and total charge at substituent 3' (QS3'). The features found as being responsible for the antioxidant activity of the flavonoid compounds studied are consistent with previous results found in the literature. The results obtained can also bring improvements in the search for better antioxidant flavonoid compounds.

  5. Fast Selection of Spectral Variables with B-Spline Compression

    CERN Document Server

    Rossi, Fabrice; Wertz, Vincent; Meurens, Marc; Verleysen, Michel

    2007-01-01

    The large number of spectral variables in most data sets encountered in spectral chemometrics often renders the prediction of a dependent variable uneasy. The number of variables hopefully can be reduced, by using either projection techniques or selection methods; the latter allow for the interpretation of the selected variables. Since the optimal approach of testing all possible subsets of variables with the prediction model is intractable, an incremental selection approach using a nonparametric statistics is a good option, as it avoids the computationally intensive use of the model itself. It has two drawbacks however: the number of groups of variables to test is still huge, and colinearities can make the results unstable. To overcome these limitations, this paper presents a method to select groups of spectral variables. It consists in a forward-backward procedure applied to the coefficients of a B-Spline representation of the spectra. The criterion used in the forward-backward procedure is the mutual infor...

  6. Variable selection through CART

    CERN Document Server

    Sauvé, Marie

    2011-01-01

    This paper deals with variable selection in the regression and binary classification frameworks. It proposes an automatic and exhaustive procedure which relies on the use of the CART algorithm and on model selection via penalization. This work, of theoretical nature, aims at determining adequate penalties, i.e. penalties which allow to get oracle type inequalities justifying the performance of the proposed procedure. Since the exhaustive procedure can not be executed when the number of variables is too big, a more practical procedure is also proposed and still theoretically validated. A simulation study completes the theoretical results.

  7. Chemometric Amylose Modeling and Sample Selection for Global Calibration Using Artificial Neural Networks

    Institute of Scientific and Technical Information of China (English)

    SHIMIZU N; OKADOME H; WADA D; KIMURA T; OHTSUBO K

    2008-01-01

    Chemometric arnylose modeling for global calibration, using whole grain near infrared transmittance spectra andsample selection, was used in an artificial neural network (ANN), to assess the global and local models generated, based onsamples of newly bred Indica, Japonica and rice. Global sample sets had a wide range of sample variation for amylose content(0 to 25.9%). The local sample set, Japonica sample, had relatively low amylose content and a narrow sample variation(amylose; 12.3% to 21.0%). For sample selection the CENTER algorithm was applied to generate calibration, validation andstop sample sets. Spectral preprocessing was found to reduce the optimum number of partial least squares (PLS) componentsfor amylose content and thus enhance the robustness of the local calibration. The best model was found to be an ANN globalcalibration with spectral preprocessing; the next was a PLS global calibration using standard spectra. These results pose thequestion whether an ANN algorithm with spectral preprocessing could be developed for global and local calibration models orwhether PLS without spectral preprocessing should be developed for global calibration models. We suggest that global calibra-tion models incorporating an ANN may be used as a universal calibration model.

  8. Full spectrum and selected spectrum based chemometric methods for the simultaneous determination of Cinnarizine and Dimenhydrinate in laboratory prepared mixtures and pharmaceutical dosage form

    Science.gov (United States)

    Tawakkol, Shereen M.; El-Zeiny, Mohamed B.; Hemdan, A.

    2017-02-01

    Three chemometric methods namely, concentration residual augmented classical least squares (CRACLS), spectral residual augmented classical least squares (SRACLS) and partial least squares (PLS) were applied for the simultaneous quantitative determination of Cinnarizine and Dimenhydrinate in their binary mixtures. All techniques were applied with and without variable selection using genetic algorithm (GA) resulting in six models (CRACLS, GA-CRACLS, SRACLS, GA-SRACLS, PLS, GA-PLS). These models were applied for the simultaneous determination of the drugs in their laboratory prepared mixtures and pharmaceutical dosage form via handling their UV spectral data. It was found that GA based models are simpler and more robust than those built with the full spectral data. The proposed models were found to be simple, fast and require no preliminary separation steps; so they can be used for the routine analysis of this binary mixture in quality control laboratories.

  9. Benchmarking Variable Selection in QSAR.

    Science.gov (United States)

    Eklund, Martin; Norinder, Ulf; Boyer, Scott; Carlsson, Lars

    2012-02-01

    Variable selection is important in QSAR modeling since it can improve model performance and transparency, as well as reduce the computational cost of model fitting and predictions. Which variable selection methods that perform well in QSAR settings is largely unknown. To address this question we, in a total of 1728 benchmarking experiments, rigorously investigated how eight variable selection methods affect the predictive performance and transparency of random forest models fitted to seven QSAR datasets covering different endpoints, descriptors sets, types of response variables, and number of chemical compounds. The results show that univariate variable selection methods are suboptimal and that the number of variables in the benchmarked datasets can be reduced with about 60 % without significant loss in model performance when using multivariate adaptive regression splines MARS and forward selection.

  10. Adaptive Robust Variable Selection

    CERN Document Server

    Fan, Jianqing; Barut, Emre

    2012-01-01

    Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized least absolute deviation (LAD) method with weighted $L_1$-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the $L_1$-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is c...

  11. Variable Selection in Discriminant Analysis.

    Science.gov (United States)

    Huberty, Carl J.; Mourad, Salah A.

    Methods for ordering and selecting variables for discriminant analysis in multiple group comparison or group prediction studies include: univariate Fs, stepwise analysis, learning discriminant function (LDF) variable correlations, communalities, LDF standardized coefficients, and weighted standardized coefficients. Five indices based on distance,…

  12. Variable Selection in Discriminant Analysis.

    Science.gov (United States)

    Huberty, Carl J.; Mourad, Salah A.

    Methods for ordering and selecting variables for discriminant analysis in multiple group comparison or group prediction studies include: univariate Fs, stepwise analysis, learning discriminant function (LDF) variable correlations, communalities, LDF standardized coefficients, and weighted standardized coefficients. Five indices based on distance,…

  13. Seleção de variáveis em QSAR Variable selection in QSAR

    Directory of Open Access Journals (Sweden)

    Márcia Miguel Castro Ferreira

    2002-05-01

    Full Text Available The process of building mathematical models in quantitative structure-activity relationship (QSAR studies is generally limited by the size of the dataset used to select variables from. For huge datasets, the task of selecting a given number of variables that produces the best linear model can be enormous, if not unfeasible. In this case, some methods can be used to separate good parameter combinations from the bad ones. In this paper three methodologies are analyzed: systematic search, genetic algorithm and chemometric methods. These methods have been exposed and discussed through practical examples.

  14. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    Science.gov (United States)

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  15. Stepwise Variable Selection in Factor Analysis.

    Science.gov (United States)

    Kano, Yutaka; Harada, Akira

    2000-01-01

    Takes several goodness-of-fit statistics as measures of variable selection and develops backward elimination and forward selection procedures in exploratory factor analysis. A newly developed variable selection program, SEFA, can print several fit measures for a current model and models obtained by removing an internal variable or adding an…

  16. Implementing Variable Selection Techniques in Regression.

    Science.gov (United States)

    Thayer, Jerome D.

    Variable selection techniques in stepwise regression analysis are discussed. In stepwise regression, variables are added or deleted from a model in sequence to produce a final "good" or "best" predictive model. Stepwise computer programs are discussed and four different variable selection strategies are described. These…

  17. "Turn-off" fluorescent data array sensor based on double quantum dots coupled with chemometrics for highly sensitive and selective detection of multicomponent pesticides.

    Science.gov (United States)

    Fan, Yao; Liu, Li; Sun, Donglei; Lan, Hanyue; Fu, Haiyan; Yang, Tianming; She, Yuanbin; Ni, Chuang

    2016-04-15

    As a popular detection model, the fluorescence "turn-off" sensor based on quantum dots (QDs) has already been successfully employed in the detections of many materials, especially in the researches on the interactions between pesticides. However, the previous studies are mainly focused on simple single track or the comparison based on similar concentration of drugs. In this work, a new detection method based on the fluorescence "turn-off" model with water-soluble ZnCdSe and CdSe QDs simultaneously as the fluorescent probes is established to detect various pesticides. The fluorescence of the two QDs can be quenched by different pesticides with varying degrees, which leads to the differences in positions and intensities of two peaks. By combining with chemometrics methods, all the pesticides can be qualitative and quantitative respectively even in real samples with the limit of detection was 2 × 10(-8) mol L(-1) and a recognition rate of 100%. This work is, to the best of our knowledge, the first report on the detection of pesticides based on the fluorescence quenching phenomenon of double quantum dots combined with chemometrics methods. What's more, the excellent selectivity of the system has been verified in different mediums such as mixed ion disruption, waste water, tea and water extraction liquid drugs.

  18. Chemometrics-assisted excitation-emission fluorescence analytical data for rapid and selective determination of optical brighteners in the presence of uncalibrated interferences

    Science.gov (United States)

    Gholami, Ali; Masoum, Saeed; Mohsenikia, Atefeh; Abbasi, Saleheh

    2016-01-01

    This study describes a novel approach for the simultaneous determination of CBS-X and CXT as widely used optical brighteners in household detergent, by combining the advantage of the high sensitivity of molecular fluorescence, and the selectivity of second-order chemometric methods. The proposed method is assisted by second-order chemometric analyses employing the PARAFAC, SWATLD and APTLD that help us to determine CBS-X and CXT in laundry powders and environmental samples, through the unique decomposition of the three-way data array. Proposed method can provide the extraction of relative concentrations of the analytes, as well as the spectral profiles. This approach achieves the second-order advantage and in principle could be able to overcome the spectral uncalibrated interference problems in the determination of CBS-X and CXT at the ng g- 1 level. By spiking the known concentrations of these compounds to the real samples, the accuracy of the proposed methods was validated and recoveries of the spiked values were calculated. High recoveries (90.00%-113.33%) for the spiked laundry powders and real environmental samples indicate the present method successfully faces this complex challenge without the necessity of applying separation and preconcentration steps in environmental contaminations.

  19. Variable Selection Strategies in Discriminate Analysis.

    Science.gov (United States)

    Tanguma, Jesus

    This paper presents three variable selection strategies in discriminate analysis (all variables in the model, use of stepwise methods, and all possible subsets). All three methods are illustrated through examples. Although the all variables in the model and the stepwise methods are the most widely used, B. Thompson (1996) and C. Huberty (1994)…

  20. A novel storage method for near infrared spectroscopy chemometric models.

    Science.gov (United States)

    Zhang, Zhi-Min; Chen, Shan; Liang, Yi-Zeng

    2010-06-04

    Chemometric Modeling Markup Language (CMML) is developed by us for containing chemometrics models within one document through converting binary data into strings by base64 encode/decode algorithms to solve the interoperability issue in sharing chemometrics models. It provides a base functionality for storage of sampling, variable selection, pretreating, outlier and modeling parameters and data. With the help of base64 algorithm, the usability of CMML is in equilibrium with size by transforming the binary data into base64 encoded string. Due to the advantages of Extensible Markup Language (XML), models stored in CMML can be easily reused in various other software and programming languages as long as the programming language has XML parsing library. One can also use the XML Path Language (XPath) query language to select desired data from the CMML file effectively. The application of this language in near infrared spectroscopy model storage is implemented as a class in C++ language and available as open source software (http://code.google.com/p/cmml), and the implementations in other languages, such as MATLAB and R are in progress.

  1. Robust procedures in chemometrics

    DEFF Research Database (Denmark)

    Kotwa, Ewelina

    -way chemometrical methods, such as PCA and PARAFAC models for analysing spatial and depth profiles of sea water samples, defined by three data modes: depth, variables and geographical location. Emphasis was also put on predicting fluorescence values, as being a natural measure of biological activity, by applying....... applying a multivariate and multi-way data analytical frame-work in fields where less sophisticated data analysis methods are currently used, and 2. developing new, more robust alternatives to already existing multivariate tools. The first part of the study was realised by applying two- and three...... and comparing the Partial Least Squares (PLS) regression technique with its multi-way alternative, N-PLS. Results of the analysis indicated superiority of the three-way frame-work, potentially constituting a novel assessment of the sea water measurements. Particularly in the case of regression models...

  2. Variable selection by lasso-type methods

    Directory of Open Access Journals (Sweden)

    Sohail Chand

    2011-09-01

    Full Text Available Variable selection is an important property of shrinkage methods. The adaptive lasso is an oracle procedure and can do consistent variable selection. In this paper, we provide an explanation that how use of adaptive weights make it possible for the adaptive lasso to satisfy the necessary and almost sufcient condition for consistent variable selection. We suggest a novel algorithm and give an important result that for the adaptive lasso if predictors are normalised after the introduction of adaptive weights, it makes the adaptive lasso performance identical to the lasso.

  3. Variable selection with error control: Another look at Stability Selection

    CERN Document Server

    Shah, Rajen

    2011-01-01

    Stability Selection was recently introduced by Meinshausen and Buhlmann (2010) as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called Complementary Pairs Stability Selection (CPSS), and derive bounds both on the expected number of variables included by CPSS that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded. These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure. Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology.

  4. “Turn-off” fluorescent data array sensor based on double quantum dots coupled with chemometrics for highly sensitive and selective detection of multicomponent pesticides

    Energy Technology Data Exchange (ETDEWEB)

    Fan, Yao; Liu, Li; Sun, Donglei; Lan, Hanyue [The Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei Province, College of Pharmacy, South-Central University for Nationalities, Wuhan 430074 (China); Fu, Haiyan, E-mail: fuhaiyan@mail.scuec.edu.cn [The Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei Province, College of Pharmacy, South-Central University for Nationalities, Wuhan 430074 (China); Yang, Tianming, E-mail: tmyang@mail.scuec.edu.cn [The Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei Province, College of Pharmacy, South-Central University for Nationalities, Wuhan 430074 (China); She, Yuanbin, E-mail: sheyb@zjut.edu.cn [State Key Laboratory Breeding Base of Green Chemistry-Synthesis Technology, College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310032 (China); Ni, Chuang [The Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei Province, College of Pharmacy, South-Central University for Nationalities, Wuhan 430074 (China)

    2016-04-15

    As a popular detection model, the fluorescence “turn-off” sensor based on quantum dots (QDs) has already been successfully employed in the detections of many materials, especially in the researches on the interactions between pesticides. However, the previous studies are mainly focused on simple single track or the comparison based on similar concentration of drugs. In this work, a new detection method based on the fluorescence “turn-off” model with water-soluble ZnCdSe and CdSe QDs simultaneously as the fluorescent probes is established to detect various pesticides. The fluorescence of the two QDs can be quenched by different pesticides with varying degrees, which leads to the differences in positions and intensities of two peaks. By combining with chemometrics methods, all the pesticides can be qualitative and quantitative respectively even in real samples with the limit of detection was 2 × 10{sup −8} mol L{sup −1} and a recognition rate of 100%. This work is, to the best of our knowledge, the first report on the detection of pesticides based on the fluorescence quenching phenomenon of double quantum dots combined with chemometrics methods. What's more, the excellent selectivity of the system has been verified in different mediums such as mixed ion disruption, waste water, tea and water extraction liquid drugs. - Highlights: • A new model based on double QDs is established for pesticide residues detection. • The fluorescent data array sensor is coupled with chmometrics methods. • The sensor can be highly sensitive and selective detection in actual samples.

  5. Machine learning techniques to select variable stars

    Science.gov (United States)

    García-Varela, Alejandro; Pérez, Muriel; Sabogal, Beatriz; Quiroz, Adolfo

    2017-09-01

    In order to perform a supervised classification of variable stars, we propose and evaluate a set of six features extracted from the magnitude density of the light curves. They are used to train automatic classification systems using state-of-the-art classifiers implemented in the R statistical computing environment. We find that random forests is the most successful method to select variables.

  6. Sensor combination and chemometric variable selection for online monitoring of Streptomyces coelicolor fed-batch cultivations

    DEFF Research Database (Denmark)

    Ödman, Peter; Johansen, C.L.; Olsson, L.

    2010-01-01

    Fed-batch cultivations of Streptomyces coelicolor, producing the antibiotic actinorhodin, were monitored online by multiwavelength fluorescence spectroscopy and off-gas analysis. Partial least squares (PLS), locally weighted regression, and multilinear PLS (N-PLS) models were built for prediction...

  7. Variable Selection in Logistic Regression Mo del

    Institute of Scientific and Technical Information of China (English)

    ZHANG Shangli; ZHANG Lili; QIU Kuanmin; LU Ying; CAI Baigen

    2015-01-01

    Variable selection is one of the most impor-tant problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logis-tic regression model. We propose a new variable selection method–the logistic elastic net, prove that it has grouping eff ect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of pre-dictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory vari-able selection method in the case when p is more larger than n. The advantage and eff ectiveness of this method are demonstrated by real leukemia data and a simulation study.

  8. Bayesian Variable Selection with Related Predictors

    CERN Document Server

    Chipman, Hugh

    2008-01-01

    In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not account for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the Stoch...

  9. Machine learning techniques to select variable stars

    Directory of Open Access Journals (Sweden)

    García-Varela Alejandro

    2017-01-01

    Full Text Available In order to perform a supervised classification of variable stars, we propose and evaluate a set of six features extracted from the magnitude density of the light curves. They are used to train automatic classification systems using state-of-the-art classifiers implemented in the R statistical computing environment. We find that random forests is the most successful method to select variables.

  10. Bayesian variable selection with spherically symmetric priors

    OpenAIRE

    De Kock, M. B.; Eggers, H. C.

    2014-01-01

    We propose that Bayesian variable selection for linear parametrisations with Gaussian iid likelihoods be based on the spherical symmetry of the diagonalised parameter space. Our r-prior results in closed forms for the evidence for four examples, including the hyper-g prior and the Zellner-Siow prior, which are shown to be special cases. Scenarios of a single variable dispersion parameter and of fixed dispersion are studied, and asymptotic forms comparable to the traditional information criter...

  11. Estimation and variable selection with exponential weights

    OpenAIRE

    Arias-Castro, Ery; Lounici, Karim

    2014-01-01

    In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for denoising/prediction. We show that such methods also succeed at variable selection and estimation under the near minimum condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variable s...

  12. Bayesian variable selection for latent class models.

    Science.gov (United States)

    Ghosh, Joyee; Herring, Amy H; Siega-Riz, Anna Maria

    2011-09-01

    In this article, we develop a latent class model with class probabilities that depend on subject-specific covariates. One of our major goals is to identify important predictors of latent classes. We consider methodology that allows estimation of latent classes while allowing for variable selection uncertainty. We propose a Bayesian variable selection approach and implement a stochastic search Gibbs sampler for posterior computation to obtain model-averaged estimates of quantities of interest such as marginal inclusion probabilities of predictors. Our methods are illustrated through simulation studies and application to data on weight gain during pregnancy, where it is of interest to identify important predictors of latent weight gain classes.

  13. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  14. Random Effect and Latent Variable Model Selection

    CERN Document Server

    Dunson, David B

    2008-01-01

    Presents various methods for accommodating model uncertainty in random effects and latent variable models. This book focuses on frequentist likelihood ratio and score tests for zero variance components. It also focuses on Bayesian methods for random effects selection in linear mixed effects and generalized linear mixed models

  15. Quasar Selection Based on Photometric Variability

    CERN Document Server

    MacLeod, C L; Ivezic, Z; Kochanek, C S; Gibson, R; Meisner, A; Kozlowski, S; Sesar, B; Becker, A C; de Vries, W

    2010-01-01

    We develop a method for separating quasars from other variable point sources using SDSS Stripe 82 light curve data for ~10,000 variable objects. To statistically describe quasar variability, we use a damped random walk model parametrized by a damping time scale, tau, and an asymptotic amplitude (structure function), SF_inf. With the aid of an SDSS spectroscopically confirmed quasar sample, we demonstrate that variability selection in typical extragalactic fields with low stellar density can deliver complete samples with reasonable purity (or efficiency, E). Compared to a selection method based solely on the slope of the structure function, the inclusion of the tau information boosts E from 60% to 75% while maintaining a highly complete sample (98%) even in the absence of color information. For a completeness of C=90%, E is boosted from 80% to 85%. Conversely, C improves from 90% to 97% while maintaining E=80% when imposing a lower limit on tau. With the aid of color selection, the purity can be further booste...

  16. Determination of Nicotine in Tobacco by Chemometric Optimization and Cation-Selective Exhaustive Injection in Combination with Sweeping-Micellar Electrokinetic Chromatography

    Directory of Open Access Journals (Sweden)

    Yi-Hui Lin

    2015-01-01

    Full Text Available Nicotine is a potent chemical that excites the central nervous system and refreshes people. It is also physically addictive and causes dependence. To reduce the harm of tobacco products for smokers, a law was introduced that requires tobacco product containers to be marked with the amount of nicotine as well as tar. In this paper, an online stacking capillary electrophoresis (CE method with cation-selective exhaustive injection sweeping-micellar electrokinetic chromatography (CSEI-sweeping-MEKC is proposed for the optimized analysis of nicotine in tobacco. A higher conductivity buffer (160 mM phosphate buffer (pH 3 zone was injected into the capillary, allowing for the analytes to be electrokinetically injected at a voltage of 15 kV for 15 min. Using 50 mM sodium dodecyl sulfate and 25% methanol in the sweeping buffer, nicotine was detected with high sensitivity. Thus, optimized conditions adapted from a chemometric approach provided a 6000-fold increase in the nicotine detection sensitivity using the CSEI-sweeping-MEKC method in comparison to normal CZE. The limits of detection were 0.5 nM for nicotine. The stacking method in combination with direct injection which matrix components would not interfere with assay performance was successfully applied to the detection of nicotine in tobacco samples.

  17. Bayesian Variable Selection via Particle Stochastic Search.

    Science.gov (United States)

    Shi, Minghui; Dunson, David B

    2011-02-01

    We focus on Bayesian variable selection in regression models. One challenge is to search the huge model space adequately, while identifying high posterior probability regions. In the past decades, the main focus has been on the use of Markov chain Monte Carlo (MCMC) algorithms for these purposes. In this article, we propose a new computational approach based on sequential Monte Carlo (SMC), which we refer to as particle stochastic search (PSS). We illustrate PSS through applications to linear regression and probit models.

  18. Free variable selection QSPR study to predict (19)F chemical shifts of some fluorinated organic compounds using Random Forest and RBF-PLS methods.

    Science.gov (United States)

    Goudarzi, Nasser

    2016-04-05

    In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the (19)F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the (19)F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.

  19. Variables influencing victim selection in genocide.

    Science.gov (United States)

    Komar, Debra A

    2008-01-01

    While victims of racially motivated violence may be identified through observation of morphological features, those targeted because of their ethnic, religious, or national identity are not easily recognized. This study examines how perpetrators of genocide recognize their victims. Court documents, including indictments, witness statements, and testimony from the International Criminal Tribunals for Rwanda and the former Yugoslavia (FY) detail the interactions between victim and assailant. A total of 6012 decedents were included in the study; only 20.8% had been positively identified. Variables influencing victim selection in Rwanda included location, segregation, incitement, and prior relationship, while significant factors in FY were segregation, location, age/gender, and social data. Additional contributing factors in both countries included self-identification, victim behavior, linguistic or clothing evidence, and morphological features. Understanding the system of recognition used by perpetrators aids investigators tasked with establishing victim identity in such prosecutions.

  20. Functional Data Analysis Applied in Chemometrics

    DEFF Research Database (Denmark)

    Muller, Martha

    In this thesis we explore the use of functional data analysis as a method to analyse chemometric data, more specically spectral data in metabolomics. Functional data analysis is a vibrant eld in statistics. It has been rapidly expanding in both methodology and applications since it was made well...... known by Ramsay & Silverman's monograph in 1997. In functional data analysis, the data are curves instead of data points. Each curve is measured at discrete points along a continuum, for example, time or frequency. It is assumed that the underlying process generating the curves is smooth......, but it is not assumed that the adjacent points measured along the continuum are independent. Standard chemometric methods originate from the eld of multivariate analysis, where variables are often assumed to be independent. Typically these methods do not explore the rich functional nature of spectral data. Metabolomics...

  1. Variable Selection of Partially Linear Single-index Mo dels

    Institute of Scientific and Technical Information of China (English)

    LU Yi-qiang; HU Bin

    2014-01-01

    In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM.

  2. The Properties of Model Selection when Retaining Theory Variables

    DEFF Research Database (Denmark)

    Hendry, David F.; Johansen, Søren

    Economic theories are often fitted directly to data to avoid possible model selection biases. We show that embedding a theory model that specifies the correct set of m relevant exogenous variables, x{t}, within the larger set of m+k candidate variables, (x{t},w{t}), then selection over the second...

  3. A Bayesian variable selection procedure for ranking overlapping gene sets

    DEFF Research Database (Denmark)

    Skarman, Axel; Mahdi Shariati, Mohammad; Janss, Luc

    2012-01-01

    described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian...... variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our...... data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability...

  4. THE TIME DOMAIN SPECTROSCOPIC SURVEY: VARIABLE SELECTION AND ANTICIPATED RESULTS

    Energy Technology Data Exchange (ETDEWEB)

    Morganson, Eric; Green, Paul J. [Harvard Smithsonian Center for Astrophysics, 60 Garden St, Cambridge, MA 02138 (United States); Anderson, Scott F.; Ruan, John J. [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); Myers, Adam D. [Department of Physics and Astronomy, University of Wyoming, Laramie, WY 82071 (United States); Eracleous, Michael; Brandt, William Nielsen [Department of Astronomy and Astrophysics, 525 Davey Laboratory, The Pennsylvania State University, University Park, PA 16802 (United States); Kelly, Brandon [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106-9530 (United States); Badenes, Carlos [Department of Physics and Astronomy and Pittsburgh Particle Physics, Astrophysics and Cosmology Center (PITT PACC), University of Pittsburgh, 3941 O’Hara St, Pittsburgh, PA 15260 (United States); Bañados, Eduardo [Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117 Heidelberg (Germany); Blanton, Michael R. [Center for Cosmology and Particle Physics, Department of Physics, New York University, 4 Washington Place, New York, NY 10003 (United States); Bershady, Matthew A. [Department of Astronomy, University of Wisconsin, 475 N. Charter St., Madison, WI 53706 (United States); Borissova, Jura [Instituto de Física y Astronomía, Universidad de Valparaíso, Av. Gran Bretaña 1111, Playa Ancha, Casilla 5030, and Millennium Institute of Astrophysics (MAS), Santiago (Chile); Burgett, William S. [GMTO Corp, Suite 300, 251 S. Lake Ave, Pasadena, CA 91101 (United States); Chambers, Kenneth, E-mail: emorganson@cfa.harvard.edu [Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 (United States); and others

    2015-06-20

    We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an Sloan Digital Sky Survey (SDSS)-IV Extended Baryon Oscillation Spectroscopic Survey (eBOSS) subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and active galactic nuclei across 7500 deg{sup 2} selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-selected quasar population has a smoother redshift distribution than a color-selected sample, and variability measurements similar to those we develop here may be used to make more uniform quasar samples in large surveys. The stellar variable targets are distributed fairly uniformly across color space, indicating that TDSS will obtain spectra for a wide variety of stellar variables including pulsating variables, stars with significant chromospheric activity, cataclysmic variables, and eclipsing binaries. TDSS will serve as a pathfinder mission to identify and characterize the multitude of variable objects that will be detected photometrically in even larger variability surveys such as Large Synoptic Survey Telescope.

  5. The Time Domain Spectroscopic Survey: Variable Selection and Anticipated Results

    Science.gov (United States)

    Morganson, Eric; Green, Paul J.; Anderson, Scott F.; Ruan, John J.; Myers, Adam D.; Eracleous, Michael; Kelly, Brandon; Badenes, Carlos; Bañados, Eduardo; Blanton, Michael R.; Bershady, Matthew A.; Borissova, Jura; Brandt, William Nielsen; Burgett, William S.; Chambers, Kenneth; Draper, Peter W.; Davenport, James R. A.; Flewelling, Heather; Garnavich, Peter; Hawley, Suzanne L.; Hodapp, Klaus W.; Isler, Jedidah C.; Kaiser, Nick; Kinemuchi, Karen; Kudritzki, Rolf P.; Metcalfe, Nigel; Morgan, Jeffrey S.; Pâris, Isabelle; Parvizi, Mahmoud; Poleski, Radosław; Price, Paul A.; Salvato, Mara; Shanks, Tom; Schlafly, Eddie F.; Schneider, Donald P.; Shen, Yue; Stassun, Keivan; Tonry, John T.; Walter, Fabian; Waters, Chris Z.

    2015-06-01

    We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an Sloan Digital Sky Survey (SDSS)-IV Extended Baryon Oscillation Spectroscopic Survey (eBOSS) subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and active galactic nuclei across 7500 deg2 selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-selected quasar population has a smoother redshift distribution than a color-selected sample, and variability measurements similar to those we develop here may be used to make more uniform quasar samples in large surveys. The stellar variable targets are distributed fairly uniformly across color space, indicating that TDSS will obtain spectra for a wide variety of stellar variables including pulsating variables, stars with significant chromospheric activity, cataclysmic variables, and eclipsing binaries. TDSS will serve as a pathfinder mission to identify and characterize the multitude of variable objects that will be detected photometrically in even larger variability surveys such as Large Synoptic Survey Telescope.

  6. Chemometrics review for chemical sensor development, task 7 report

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-05-01

    This report, the seventh in a series on the evaluation of several chemical sensors for use in the U.S. Department of Energy`s (DOE`s) site characterization and monitoring programs, concentrates on the potential use of chemometrics techniques in analysis of sensor data. Chemometrics is the chemical discipline that uses mathematical, statistical, and other methods that employ formal logic to: design or select optimal measurement procedures and experiments and provide maximum relevant chemical information by analyzing chemical data. The report emphasizes the latter aspect. In a formal sense, two distinct phases are in chemometrics applications to analytical chemistry problems: (1) the exploratory data analysis phase and (2) the calibration and prediction phase. For use in real-world problems, it is wise to add a third aspect - the independent validation and verification phase. In practical applications, such as the ERWM work, and in order of decreasing difficulties, the most difficult tasks in chemometrics are: establishing the necessary infrastructure (to manage sampling records, data handling, and data storage and related aspects), exploring data analysis, and solving calibration problems, especially for nonlinear models. Chemometrics techniques are different for what are called zeroth-, first-, and second-order systems, and the details depend on the form of the assumed functional relationship between the measured response and the concentrations of components in mixtures. In general, linear relationships can be handled relatively easily, but nonlinear relationships can be difficult.

  7. Chemometrics in multispectral imaging for quality inspection of postharvest products

    NARCIS (Netherlands)

    Noordam, Jan Corstiaan

    2005-01-01

    This thesis describes different novel chemometric techniques applied to multispectral images for quality inspection on agricultural food products. These images do not only have a huge number of spectral bands which makes training set selection a challenging task, they also contain classes with small

  8. Bayesian Variable Selection in Cost-Effectiveness Analysis

    Directory of Open Access Journals (Sweden)

    Miguel A. Negrín

    2010-04-01

    Full Text Available Linear regression models are often used to represent the cost and effectiveness of medical treatment. The covariates used may include sociodemographic variables, such as age, gender or race; clinical variables, such as initial health status, years of treatment or the existence of concomitant illnesses; and a binary variable indicating the treatment received. However, most studies estimate only one model, which usually includes all the covariates. This procedure ignores the question of uncertainty in model selection. In this paper, we examine four alternative Bayesian variable selection methods that have been proposed. In this analysis, we estimate the inclusion probability of each covariate in the real model conditional on the data. Variable selection can be useful for estimating incremental effectiveness and incremental cost, through Bayesian model averaging, as well as for subgroup analysis.

  9. Research on Some Questions About Selection of Independent Variables

    Institute of Scientific and Technical Information of China (English)

    TAO Jing-xuan

    2002-01-01

    The paper studies four methods about selection of independent variables in multivariate analysis. In general condition, advanced statistical method and backward statistical method could not obtain the best subset of independent variables. It is possibly affected by the orders of variables or associations among variables. When multicollinearity is presented in a set of explanatory variables-abnormal state, it is not effective to use the method, although stepwise regression and optimum selecting method of total subsets is widely used.According to this case, the paper proposes a new method which combines deleting variables with ingredient analysis and is used in research and science practically.The important characteristic of this paper is that it gives some examples to support each conclusion.

  10. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    Science.gov (United States)

    2014-12-01

    ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address

  11. On the Variable Selection Problem in Multiple Group Discriminant Analysis.

    Science.gov (United States)

    Huberty, Carl J.

    This study was concerned with various schemes for reducing the number of variables in a multivariate analysis. Two sets of illustrative data were used; the numbers of criterion groups were 3 and 5. The proportion of correct classifications was employed as an index of discriminatory power of each subset of variables selected. Of the four procedures…

  12. A Variable-Selection Heuristic for K-Means Clustering.

    Science.gov (United States)

    Brusco, Michael J.; Cradit, J. Dennis

    2001-01-01

    Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)

  13. Model and Variable Selection Procedures for Semiparametric Time Series Regression

    Directory of Open Access Journals (Sweden)

    Risa Kato

    2009-01-01

    Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.

  14. Nearly unbiased variable selection under minimax concave penalty

    CERN Document Server

    Zhang, Cun-Hui

    2010-01-01

    We propose MC+, a fast, continuous, nearly unbiased and accurate method of penalized variable selection in high-dimensional linear regression. The LASSO is fast and continuous, but biased. The bias of the LASSO may prevent consistent variable selection. Subset selection is unbiased but computationally costly. The MC+ has two elements: a minimax concave penalty (MCP) and a penalized linear unbiased selection (PLUS) algorithm. The MCP provides the convexity of the penalized loss in sparse regions to the greatest extent given certain thresholds for variable selection and unbiasedness. The PLUS computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the penalized loss. Its output is a continuous piecewise linear path encompassing from the origin for infinite penalty to a least squares solution for zero penalty. We prove that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, ...

  15. A Bayesian variable selection procedure to rank overlapping gene sets

    Directory of Open Access Journals (Sweden)

    Skarman Axel

    2012-05-01

    Full Text Available Abstract Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.

  16. Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

    Science.gov (United States)

    Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

    2017-04-01

    ) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index

  17. Financial applications of a Tabu search variable selection model

    Directory of Open Access Journals (Sweden)

    Zvi Drezner

    2001-01-01

    Full Text Available We illustrate how a comparatively new technique, a Tabu search variable selection model [Drezner, Marcoulides and Salhi (1999], can be applied efficiently within finance when the researcher must select a subset of variables from among the whole set of explanatory variables under consideration. Several types of problems in finance, including corporate and personal bankruptcy prediction, mortgage and credit scoring, and the selection of variables for the Arbitrage Pricing Model, require the researcher to select a subset of variables from a larger set. In order to demonstrate the usefulness of the Tabu search variable selection model, we: (1 illustrate its efficiency in comparison to the main alternative search procedures, such as stepwise regression and the Maximum R2 procedure, and (2 show how a version of the Tabu search procedure may be implemented when attempting to predict corporate bankruptcy. We accomplish (2 by indicating that a Tabu Search procedure increases the predictability of corporate bankruptcy by up to 10 percentage points in comparison to Altman's (1968 Z-Score model.

  18. Variable selection and estimation for longitudinal survey data

    KAUST Repository

    Wang, Li

    2014-09-01

    There is wide interest in studying longitudinal surveys where sample subjects are observed successively over time. Longitudinal surveys have been used in many areas today, for example, in the health and social sciences, to explore relationships or to identify significant variables in regression settings. This paper develops a general strategy for the model selection problem in longitudinal sample surveys. A survey weighted penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle procedure when the correct submodel was known. The estimating function bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal survey data. Simulated examples are illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs. © 2014 Elsevier Inc.

  19. A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration.

    Science.gov (United States)

    Yun, Yong-Huan; Wang, Wei-Ting; Tan, Min-Li; Liang, Yi-Zeng; Li, Hong-Dong; Cao, Dong-Sheng; Lu, Hong-Mei; Xu, Qing-Song

    2014-01-07

    Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.

  20. Improvement of near infrared spectroscopic (NIRS) analysis of caffeine in roasted Arabica coffee by variable selection method of stability competitive adaptive reweighted sampling (SCARS)

    Science.gov (United States)

    Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping

    2013-10-01

    Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non

  1. Variable Selection in the Partially Linear Errors-in-Variables Models for Longitudinal Data

    Institute of Scientific and Technical Information of China (English)

    Yi-ping YANG; Liu-gen XUE; Wei-hu CHENG

    2012-01-01

    This paper proposes a new approach for variable selection in partially linear errors-in-variables (EV) models for longitudinal data by penalizing appropriate estimating functions.We apply the SCAD penalty to simultaneously select significant variables and estimate unknown parameters.The rate of convergence and the asymptotic normality of the resulting estimators are established.Furthermore,with proper choice of regularization parameters,we show that the proposed estimators perform as well as the oracle procedure.A new algorithm is proposed for solving penalized estimating equation.The asymptotic results are augmented by a simulation study.

  2. Variable Selection with Exponential Weights and $l_0$-Penalization

    OpenAIRE

    Arias-Castro, Ery; Lounici, Karim

    2012-01-01

    In the context of a linear model with a sparse coefficient vector, exponential weights methods have been shown to be achieve oracle inequalities for prediction. We show that such methods also succeed at variable selection and estimation under the necessary identifiability condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variabl...

  3. Chemometric evaluation of brompheniramine-tannate complexes.

    Science.gov (United States)

    Zidan, Ahmed S; Rahman, Ziyaur; Khan, Mansoor A

    2012-04-01

    The objective of the current study was to evaluate the performance of Raman and near-infrared (NIR) techniques combined with chemometrics in characterizing the critical quality attributes of brompheniramine (BP)-tannate complexes. Seven complexes were prepared and evaluated for chemical interactions, solubilities, dissolutions, and spatial distributions by NIR chemical imaging (CI). Principal component analysis (PCA) was applied before either partial least squares regression (PLSR) or principal component regression (PCR) models were developed. Complexation was confirmed by Fourier transform IR analysis to yield complexes of lower drug solubilities and sustained-release characteristics in alkaline media. PCA results showed better discrimination ability by NIR than by Raman spectroscopy. Compared with PCR, the PLSR predictions errors, calculated from the Raman and NIR data with second-derivative pretreatment, showed lesser values of 2.68, 0.37, 1.79, and 5.60 and 0.58, 0.25, 0.93, and 0.58 for complex solubilities in acidic and alkaline media and percentages dissolved after 1 and 20 h, respectively. In addition, good correlation (>0.95) was obtained for predicting the drug concentration using PLSR score images explaining the validity of the NIR-CI model for spatial quantitation of BP within its tannate complexes. In conclusion, the chemometric analysis of NIR and/or Raman spectra represented an innovative approach to determine the tannate complexation variability. Copyright © 2011 Wiley Periodicals, Inc.

  4. The Properties of Model Selection when Retaining Theory Variables

    DEFF Research Database (Denmark)

    Hendry, David F.; Johansen, Søren

    Economic theories are often fitted directly to data to avoid possible model selection biases. We show that embedding a theory model that specifies the correct set of m relevant exogenous variables, x{t}, within the larger set of m+k candidate variables, (x{t},w{t}), then selection over the second...... set by their statistical significance can be undertaken without affecting the estimator distribution of the theory parameters. This strategy returns the theory-parameter estimates when the theory is correct, yet protects against the theory being under-specified because some w{t} are relevant....

  5. Solvent effect modelling of isocyanuric products synthesis by chemometric methods

    OpenAIRE

    Havet, Jean-Louis; Billiau-Loreau, Myriam; Porte, Catherine; Delacroix, Alain

    2002-01-01

    Chemometric tools were used to generate the modelling of solvent e¡ects on the N-alkylation of an isocyanuric acid salt. The method proceeded from a central composite design applied on the Carlson solvent classification using principal components analysis. The selectivity of the reaction was studied from the production of different substituted isocyanuric derivatives. Response graphs were obtained for each compound and used to devise a strategy for solvent selection. The prediction models wer...

  6. A New Statistic for Variable Selection in Questionnaire Analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG Jun-hua; FANG Wei-wu

    2001-01-01

    In this paper, a new statistic is proposed for variable selection which is one of the important problems in analysis of questionnaire data. Contrasting to other methods, the approach introduced here can be used not only for two groups of samples but can also be easily generalized to the multi-group case.

  7. An Approximation Technique For Variable Selection Using Cost Criteria

    Science.gov (United States)

    Ryan, Thomas P.

    1978-01-01

    The problem of selecting regression variables using cost criteria is considered. A method is presented which approximates the optimal solution of one of several criterion functions which might be employed. Examples are given and the results are compared with the results of other methods. (Author/JKS)

  8. Loneliness as a Function of Selected Personality Variables.

    Science.gov (United States)

    Hojat, Mohammadreza

    1982-01-01

    Hypothesized that selected personality variables, could positively predict loneliness; and self-esteem and extraversion could negatively predict loneliness scores. Studied two groups of subjects: Iranian college students in American colleges and Iranian students in Iranian universities. Results confirmed the directions stated in the research…

  9. Characterizing the Optical Variability of Bright Blazars: Variability-based Selection of Fermi Active Galactic Nuclei

    Science.gov (United States)

    Ruan, John J.; Anderson, Scott F.; MacLeod, Chelsea L.; Becker, Andrew C.; Burnett, T. H.; Davenport, James R. A.; Ivezić, Željko; Kochanek, Christopher S.; Plotkin, Richard M.; Sesar, Branimir; Stuart, J. Scott

    2012-11-01

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability τ, and driving amplitudes on short timescales \\hat{\\sigma }. Imposing cuts on minimum τ and \\hat{\\sigma } allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of γ-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E >= 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r +5238 has optical variability consistent with other γ-ray blazars and is likely to be the γ-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is ~3 years in the rest frame of the jet, in contrast with the ~320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.

  10. CHARACTERIZING THE OPTICAL VARIABILITY OF BRIGHT BLAZARS: VARIABILITY-BASED SELECTION OF FERMI ACTIVE GALACTIC NUCLEI

    Energy Technology Data Exchange (ETDEWEB)

    Ruan, John J.; Anderson, Scott F.; MacLeod, Chelsea L.; Becker, Andrew C.; Davenport, James R. A.; Ivezic, Zeljko [Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195 (United States); Burnett, T. H. [Department of Physics, University of Washington, Seattle, WA 98195-1560 (United States); Kochanek, Christopher S. [Department of Astronomy, Ohio State University, 140 West 18th Avenue, Columbus, OH 43210 (United States); Plotkin, Richard M. [Department of Astronomy, University of Michigan, 500 Church Street, Ann Arbor, MI 48109 (United States); Sesar, Branimir [Division of Physics, Mathematics and Astronomy, Caltech, Pasadena, CA 91125 (United States); Stuart, J. Scott, E-mail: jruan@astro.washington.edu [Lincoln Laboratory, Massachusetts Institute of Technology, 244 Wood Street, Lexington, MA 02420-9108 (United States)

    2012-11-20

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the {approx}30% of {gamma}-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability {tau}, and driving amplitudes on short timescales {sigma}-circumflex. Imposing cuts on minimum {tau} and {sigma}-circumflex allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of {gamma}-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously associated optical counterparts to Fermi active galactic nuclei with E {>=} 88% and C = 88% in Fermi 95% confidence error ellipses having semimajor axis r < 8'. We find that the suggested radio counterpart to Fermi source 2FGL J1649.6+5238 has optical variability consistent with other {gamma}-ray blazars and is likely to be the {gamma}-ray source. Our results suggest that the variability of the non-thermal jet emission in blazars is stochastic in nature, with unique variability properties due to the effects of relativistic beaming. After correcting for beaming, we estimate that the characteristic timescale of blazar variability is {approx}3 years in the rest frame of the jet, in contrast with the {approx}320 day disk flux timescale observed in quasars. The variability-based selection method presented will be useful for blazar identification in time-domain optical surveys and is also a probe of jet physics.

  11. Meta-analysis based variable selection for gene expression data.

    Science.gov (United States)

    Li, Quefeng; Wang, Sijian; Huang, Chiang-Ching; Yu, Menggang; Shao, Jun

    2014-12-01

    Recent advance in biotechnology and its wide applications have led to the generation of many high-dimensional gene expression data sets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizing scientific evidence from multiple studies. When the dimensions of datasets are high, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. According to our knowledge, all existing methods conduct variable selection with meta-analyzed data in an "all-in-or-all-out" fashion, that is, a gene is either selected in all of studies or not selected in any study. However, due to data heterogeneity commonly exist in meta-analyzed data, including choices of biospecimens, study population, and measurement sensitivity, it is possible that a gene is important in some studies while unimportant in others. In this article, we propose a novel method called meta-lasso for variable selection with high-dimensional meta-analyzed data. Through a hierarchical decomposition on regression coefficients, our method not only borrows strength across multiple data sets to boost the power to identify important genes, but also keeps the selection flexibility among data sets to take into account data heterogeneity. We show that our method possesses the gene selection consistency, that is, when sample size of each data set is large, with high probability, our method can identify all important genes and remove all unimportant genes. Simulation studies demonstrate a good performance of our method. We applied our meta-lasso method to a meta-analysis of five cardiovascular studies. The analysis results are clinically meaningful.

  12. Variable selection based cotton bollworm odor spectroscopic detection

    Science.gov (United States)

    Lü, Chengxu; Gai, Shasha; Luo, Min; Zhao, Bo

    2016-10-01

    Aiming at rapid automatic pest detection based efficient and targeting pesticide application and shooting the trouble of reflectance spectral signal covered and attenuated by the solid plant, the possibility of near infrared spectroscopy (NIRS) detection on cotton bollworm odor is studied. Three cotton bollworm odor samples and 3 blank air gas samples were prepared. Different concentrations of cotton bollworm odor were prepared by mixing the above gas samples, resulting a calibration group of 62 samples and a validation group of 31 samples. Spectral collection system includes light source, optical fiber, sample chamber, spectrometer. Spectra were pretreated by baseline correction, modeled with partial least squares (PLS), and optimized by genetic algorithm (GA) and competitive adaptive reweighted sampling (CARS). Minor counts differences are found among spectra of different cotton bollworm odor concentrations. PLS model of all the variables was built presenting RMSEV of 14 and RV2 of 0.89, its theory basis is insect volatilizes specific odor, including pheromone and allelochemics, which are used for intra-specific and inter-specific communication and could be detected by NIR spectroscopy. 28 sensitive variables are selected by GA, presenting the model performance of RMSEV of 14 and RV2 of 0.90. Comparably, 8 sensitive variables are selected by CARS, presenting the model performance of RMSEV of 13 and RV2 of 0.92. CARS model employs only 1.5% variables presenting smaller error than that of all variable. Odor gas based NIR technique shows the potential for cotton bollworm detection.

  13. Portfolio Selection Based on Distance between Fuzzy Variables

    Directory of Open Access Journals (Sweden)

    Weiyi Qian

    2014-01-01

    Full Text Available This paper researches portfolio selection problem in fuzzy environment. We introduce a new simple method in which the distance between fuzzy variables is used to measure the divergence of fuzzy investment return from a prior one. Firstly, two new mathematical models are proposed by expressing divergence as distance, investment return as expected value, and risk as variance and semivariance, respectively. Secondly, the crisp forms of the new models are also provided for different types of fuzzy variables. Finally, several numerical examples are given to illustrate the effectiveness of the proposed approach.

  14. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    Science.gov (United States)

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Effect of Instructions on Selected Jump Squat Variables.

    Science.gov (United States)

    Talpey, Scott W; Young, Warren B; Beseler, Bradley

    2016-09-01

    Talpey, SW, Young, WB, and Beseler, B. Effect of instructions on selected jump squat variables. J Strength Cond Res 30(9): 2508-2513, 2016-The purpose of this study was to compare 2 instructions on the performance of selected variables in a jump squat (JS) exercise. The second purpose was to determine the relationships between JS variables and sprint performance. Eighteen male subjects with resistance training experience performed 2 sets of 4 JS with no extra load with the instructions to concentrate on (a) jumping for maximum height and (b) extending the legs as fast as possible to maximize explosive force. Sprint performance was assessed at 0- to 10-m and 10- to 20-m distances. From the JS jump height, peak power, relative peak power, peak force, peak velocity, and countermovement distance were measured from a force platform and position transducer system. The JS variables under the 2 instructions were compared with paired t-tests, and the relationships between these variables and sprint performance were determined with Pearson's correlations. The jump height instruction produced greater mean jump height and peak velocity (p 0.05). Jump height was the variable that correlated most strongly with 10-m time and 10- to 20-m time under both instructions. The height instruction produced a stronger correlation with 10-m time (r = -0.455), but the fast leg extension JS produced a greater correlation with 10-20 time (r = -0.545). The results indicate that instructions have a meaningful influence on JS variables and therefore need to be taken into consideration when assessing or training athletes.

  16. Characterizing the Optical Variability of Bright Blazars: Variability-based Selection of Fermi Active Galactic Nuclei

    NARCIS (Netherlands)

    Ruan, J.J.; Anderson, S.F.; MacLeod, C.L.; Becker, A.C.; Burnett, T.H.; Davenport, J.R.A.; Ivezić, Z.; Kochanek, C.S.; Plotkin, R.M.; Sesar, B.; Stuart, J.C.

    2012-01-01

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of γ-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the opt

  17. Variable selection with stepwise and best subset approaches.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-04-01

    While purposeful selection is performed partly by software and partly by hand, the stepwise and best subset approaches are automatically performed by software. Two R functions stepAIC() and bestglm() are well designed for stepwise and best subset regression, respectively. The stepAIC() function begins with a full or null model, and methods for stepwise regression can be specified in the direction argument with character values "forward", "backward" and "both". The bestglm() function begins with a data frame containing explanatory variables and response variables. The response variable should be in the last column. Varieties of goodness-of-fit criteria can be specified in the IC argument. The Bayesian information criterion (BIC) usually results in more parsimonious model than the Akaike information criterion.

  18. The group exponential lasso for bi-level variable selection.

    Science.gov (United States)

    Breheny, Patrick

    2015-09-01

    In many applications, covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. One important example arises in genetic association studies, where genes may have several variants capable of contributing to disease. An ideal penalized regression approach would select variables by balancing both the direct evidence of a feature's importance as well as the indirect evidence offered by the grouping structure. This work proposes a new approach we call the group exponential lasso (GEL) which features a decay parameter controlling the degree to which feature selection is coupled together within groups. We demonstrate that the GEL has a number of statistical and computational advantages over previously proposed group penalties such as the group lasso, group bridge, and composite MCP. Finally, we apply these methods to the problem of detecting rare variants in a genetic association study.

  19. Two-step variable selection in quantile regression models

    Directory of Open Access Journals (Sweden)

    FAN Yali

    2015-06-01

    Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions,in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform l1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.

  20. Isoenzymatic variability in tropical maize populations under reciprocal recurrent selection

    Directory of Open Access Journals (Sweden)

    Pinto Luciana Rossini

    2003-01-01

    Full Text Available Maize (Zea mays L. is one of the crops in which the genetic variability has been extensively studied at isoenzymatic loci. The genetic variability of the maize populations BR-105 and BR-106, and the synthetics IG-3 and IG-4, obtained after one cycle of a high-intensity reciprocal recurrent selection (RRS, was investigated at seven isoenzymatic loci. A total of twenty alleles were identified, and most of the private alleles were found in the BR-106 population. One cycle of reciprocal recurrent selection (RRS caused reductions of 12% in the number of alleles in both populations. Changes in allele frequencies were also observed between populations and synthetics, mainly for the Est 2 locus. Populations presented similar values for the number of alleles per locus, percentage of polymorphic loci, and observed and expected heterozygosities. A decrease of the genetic variation values was observed for the synthetics as a consequence of genetic drift effects and reduction of the effective population sizes. The distribution of the genetic diversity within and between populations revealed that most of the diversity was maintained within them, i.e. BR-105 x BR-106 (G ST = 3.5% and IG-3 x IG-4 (G ST = 4.0%. The genetic distances between populations and synthetics increased approximately 21%. An increase in the genetic divergence between the populations occurred without limiting new selection procedures.

  1. Bayesian nonparametric centered random effects models with variable selection.

    Science.gov (United States)

    Yang, Mingan

    2013-03-01

    In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.

  2. Variable Selection for Road Segmentation in Aerial Images

    Science.gov (United States)

    Warnke, S.; Bulatov, D.

    2017-05-01

    For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

  3. Robust nonlinear variable selective control for networked systems

    Science.gov (United States)

    Rahmani, Behrooz

    2016-10-01

    This paper is concerned with the networked control of a class of uncertain nonlinear systems. In this way, Takagi-Sugeno (T-S) fuzzy modelling is used to extend the previously proposed variable selective control (VSC) methodology to nonlinear systems. This extension is based upon the decomposition of the nonlinear system to a set of fuzzy-blended locally linearised subsystems and further application of the VSC methodology to each subsystem. To increase the applicability of the T-S approach for uncertain nonlinear networked control systems, this study considers the asynchronous premise variables in the plant and the controller, and then introduces a robust stability analysis and control synthesis. The resulting optimal switching-fuzzy controller provides a minimum guaranteed cost on an H2 performance index. Simulation studies on three nonlinear benchmark problems demonstrate the effectiveness of the proposed method.

  4. Estimation and variable selection for generalized additive partial linear models

    KAUST Repository

    Wang, Li

    2011-08-01

    We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration. © Institute of Mathematical Statistics, 2011.

  5. Characterizing the Optical Variability of Bright Blazars: Variability-Based Selection of Fermi AGN

    CERN Document Server

    Ruan, John J; MacLeod, Chelsea L; Becker, Andrew C; Burnett, T H; Davenport, James R A; Ivezic, Zeljko; Kochanek, Christopher S; Plotkin, Richard M; Sesar, Branimir; Stuart, J Scott

    2012-01-01

    We investigate the use of optical photometric variability to select and identify blazars in large-scale time-domain surveys, in part to aid in the identification of blazar counterparts to the ~30% of gamma-ray sources in the Fermi 2FGL catalog still lacking reliable associations. Using data from the optical LINEAR asteroid survey, we characterize the optical variability of blazars by fitting a damped random walk model to individual light curves with two main model parameters, the characteristic timescales of variability (tau), and driving amplitudes on short timescales (sigma). Imposing cuts on minimum tau and sigma allows for blazar selection with high efficiency E and completeness C. To test the efficacy of this approach, we apply this method to optically variable LINEAR objects that fall within the several-arcminute error ellipses of gamma-ray sources in the Fermi 2FGL catalog. Despite the extreme stellar contamination at the shallow depth of the LINEAR survey, we are able to recover previously-associated ...

  6. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    Science.gov (United States)

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romanach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using

  7. Penalized maximum likelihood estimation and variable selection in geostatistics

    CERN Document Server

    Chu, Tingjin; Wang, Haonan; 10.1214/11-AOS919

    2012-01-01

    We consider the problem of selecting covariates in spatial linear models with Gaussian process errors. Penalized maximum likelihood estimation (PMLE) that enables simultaneous variable selection and parameter estimation is developed and, for ease of computation, PMLE is approximated by one-step sparse estimation (OSE). To further improve computational efficiency, particularly with large sample sizes, we propose penalized maximum covariance-tapered likelihood estimation (PMLE$_{\\mathrm{T}}$) and its one-step sparse estimation (OSE$_{\\mathrm{T}}$). General forms of penalty functions with an emphasis on smoothly clipped absolute deviation are used for penalized maximum likelihood. Theoretical properties of PMLE and OSE, as well as their approximations PMLE$_{\\mathrm{T}}$ and OSE$_{\\mathrm{T}}$ using covariance tapering, are derived, including consistency, sparsity, asymptotic normality and the oracle properties. For covariance tapering, a by-product of our theoretical results is consistency and asymptotic normal...

  8. Prediction and variable selection with the adaptive Lasso

    CERN Document Server

    van de Geer, Sara; Zhou, Shuheng

    2010-01-01

    We revisit the adaptive Lasso in a high-dimensional linear model, and provide bounds for its prediction error and for its number of false positive selections. We compare the adaptive Lasso with an "oracle" that trades off approximation error against an l_0-penalty. Considering prediction error and false positives simultaneously is a way to study variable selection performance in settings where non-zero regression coefficients can be smaller than the detection limit. We show that an appropriate choice of the tuning parameter yields a prediction error of the same order as that of the least squares refitted initial Lasso after thresholding, while the number of false positives is small, depending on the size of the trimmed harmonic mean of the oracle coefficients.

  9. Fluorescence Spectroscopy and Chemometric Modeling for Bioprocess Monitoring

    Directory of Open Access Journals (Sweden)

    Saskia M. Faassen

    2015-04-01

    Full Text Available On-line sensors for the detection of crucial process parameters are desirable for the monitoring, control and automation of processes in the biotechnology, food and pharma industry. Fluorescence spectroscopy as a highly developed and non-invasive technique that enables the on-line measurements of substrate and product concentrations or the identification of characteristic process states. During a cultivation process significant changes occur in the fluorescence spectra. By means of chemometric modeling, prediction models can be calculated and applied for process supervision and control to provide increased quality and the productivity of bioprocesses. A range of applications for different microorganisms and analytes has been proposed during the last years. This contribution provides an overview of different analysis methods for the measured fluorescence spectra and the model-building chemometric methods used for various microbial cultivations. Most of these processes are observed using the BioView® Sensor, thanks to its robustness and insensitivity to adverse process conditions. Beyond that, the PLS-method is the most frequently used chemometric method for the calculation of process models and prediction of process variables.

  10. UPS Delivers Optimal Phase Diagram in High Dimensional Variable Selection

    CERN Document Server

    Ji, Pengsheng

    2010-01-01

    Consider linear regression in the so-called regime of p much larger than n. We propose the UPS as a new variable selection method. This is a Screen and Clean procedure [Wasserman and Roeder (2009)], in which we screen with the Univariate thresholding, and clean with the Penalized MLE. In many situations, the UPS possesses two important properties: Sure Screening and Separable After Screening (SAS). These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. We measure the performance of variable selection procedure by the Hamming distance. In many situations, we find that the UPS achieves the optimal rate of convergence, and also yields an optimal partition of the so-called phase diagram. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, there is a three-phase diagram shared by many choices of design matrices. In the first phase, it is possible to recover all signals. In the second phase, exa...

  11. The Tabu Search Procedure: An Alternative to the Variable Selection Methods

    Science.gov (United States)

    Mills, Jamie, D.; Olejnik, Stephen, F.; Marcoulides, George, A.

    2005-01-01

    The effectiveness of the Tabu variable selection algorithm, to identify predictor variables related to a criterion variable, is compared with the stepwise variable selection method and the all possible regression approach. Considering results obtained from previous research, Tabu is more successful in identifying relevant variables than the…

  12. Chemometric analysis of metal contents in different types of chocolates

    Directory of Open Access Journals (Sweden)

    Jevrić Lidija R.

    2014-01-01

    Full Text Available The relationships between the contents of various metals (Cu, Ni, Pb and Al in 38 different milk chocolate samples were studied using a chemometric approach. The chemometric expressions were generated using a training set of 25 chocolate samples and the predictive ability of the resulting models was evaluated against a test set of 13 chocolate samples. The chemometric analysis was based on the application of multiple linear regression analysis (MLR. MLR was performed in order to select the significant models for predicting the metal contents. The MLR equations that represent the content of one metal as a function of the contents of other metals were established. High agreement between experimental and predicted values, obtained in the validation procedure, indicated the good quality of the models. It enables the researchers to establish reliable relationships between the contents of various metals which can be used for their prediction in different types of chocolate prior to their analysis. This can reduce the trial-and-error element and experimental costs in the production.[Projekat Ministarstva nauke Republike Srbije, br. 31055, br. 172012 i br. 172014

  13. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    Energy Technology Data Exchange (ETDEWEB)

    Balabin, Roman M., E-mail: balabin@org.chem.ethz.ch [Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich (Switzerland); Smirnov, Sergey V. [Unimilk Joint Stock Co., 143421 Moscow Region (Russian Federation)

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm{sup -1}) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  14. Chemometrics-assisted kinetic-potentiometric methods for simultaneous determination of Fe(II), Al(III), and Zr(IV) using a fluoride ion-selective electrode.

    Science.gov (United States)

    Karimi, Mohammad Ali; Ardakani, Mohammad Mazloum; Ardakani, Reza Behjatmanesh; Mashhadizadeh, Mohammad Hossein; Monfared, Mohammad Reza Zand; Tadayon, Maryam

    2010-01-01

    Partial least-squares (PLS) and principal component regression (PCR) were used for the simple, accurate, and simultaneous determination of Fe(III), Al(III), and Zr(IV) using the kinetic data from a novel potentiometric method. The complex forming reaction rate of Fe(III), Al(III), and Zr(IV) with fluoride ions was monitored by a fluoride ion-selective electrode. The experimental data showed the good ability of ion-selective electrodes as detectors, not only for the direct determination of fluoride ion, but also for simultaneous kinetic-potentiometric analysis using the PLS and PCR methods. The methods are based on the differences observed in the complexation rate of fluoride ions. Results have demonstrated that the simultaneous determination of Fe(III), Al(III), and Zr(IV) can be performed in concentration ranges of 0.5-18.5, 0.2-14.0, and 0.4-21.0 microg/mL, respectively. After the application of PLS, the total root mean square error of prediction (RMSEP) was found to be 0.121, 0.122, and 0.129 for the 10-sample experiment of Fe(III), Al(III), and Zr(IV), respectively. For PCR, the RMSEP was found to be 0.156, 0.162, and 0.178 for the 10-sample experiment of Fe(III), Al(III), and Zr(IV), respectively. The effects of certain foreign ions upon the reaction rate were determined for assessing the selectivity of the method. The proposed methods (H-point standard addition, PLS, and PCR) were evaluated using a set of synthetic sample mixtures, and applied for the simultaneous determination of Fe(III), Al(III), and Zr(IV) in water samples.

  15. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

    Directory of Open Access Journals (Sweden)

    Himmelreich Uwe

    2009-07-01

    Full Text Available Abstract Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

  16. Relationship of adolescent self-esteem to selected academic variables.

    Science.gov (United States)

    Filozof, E M; Albertin, H K; Jones, C R; Steme, S S; Myers, L; McDermott, R J

    1998-02-01

    This study investigated whether self-esteem precedes various academic behaviors and beliefs among 593 high school students (63.7% female, 60.9% African American). Measures of home and school self-esteem, grade point average, perceived academic standing and progress, and educational plans were collected by survey and archival review of grade and attendance records at the beginning (pre-test) and end of the school year (post-test). Self-esteem and academic variables differed by gender, race, and guardianship. Self-esteem related significantly to academics and absenteeism. Results suggest selected academic variables predict self-esteem even when the effects of gender, race, and guardianship are removed and pretest self-esteem scores are controlled. In conclusion, student academic performance influences subsequent academic and home self-esteem. Creation of positive academic experiences for youth may be a critical activity, since experts contend that low self-esteem is associated with subsequent behavioral problems. The markedly lower self-esteem of Native American and Hispanic youth warrants further investigation.

  17. Empirical Likelihood Based Variable Selection for Varying Coefficient Partially Linear Models with Censored Data

    Institute of Scientific and Technical Information of China (English)

    Peixin ZHAO

    2013-01-01

    In this paper,we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data.By constructing a penalized auxiliary vector ingeniously,we propose an empirical likelihood based variable selection procedure,and show that it is consistent and satisfies the sparsity.The simulation studies show that the proposed variable selection method is workable.

  18. ChemFlow, chemometrics using Galaxy

    OpenAIRE

    Rossard, Virginie; Boulet, Jean Claude; Gogé, Fabien; Latrille, Eric; Roger, Jean-Michel

    2016-01-01

    Infrared spectroscopy is widely used in academic research and industry as simple, fast, cheap and safe measurement tool. The infrared data are displayed as spectra, and chemometric is a science which aims at extracting informations from spectra. We are developing a comprehensive package which contains (1) a MOOC broadcasted in september 2016; (2) a chemometric tool, named ChemFlow, which is an application of Galaxy; and (3) a spectral database. We will focus on ChemFlow. The required specifi...

  19. Birth order and selected work-related personality variables.

    Science.gov (United States)

    Phillips, A S; Bedeian, A G; Mossholder, K W; Touliatos, J

    1988-12-01

    A possible link between birth order and various individual characteristics (e. g., intelligence, potential eminence, need for achievement, sociability) has been suggested by personality theorists such as Adler for over a century. The present study examines whether birth order is associated with selected personality variables that may be related to various work outcomes. 3 of 7 hypotheses were supported and the effect sizes for these were small. Firstborns scored significantly higher than later borns on measures of dominance, good impression, and achievement via conformity. No differences between firstborns and later borns were found in managerial potential, work orientation, achievement via independence, and sociability. The study's sample consisted of 835 public, government, and industrial accountants responding to a national US survey of accounting professionals. The nature of the sample may have been partially responsible for the results obtained. Its homogeneity may have caused any birth order effects to wash out. It can be argued that successful membership in the accountancy profession requires internalization of a set of prescribed rules and standards. It may be that accountants as a group are locked in to a behavioral framework. Any differentiation would result from spurious interpersonal differences, not from predictable birth-order related characteristics. A final interpretation is that birth order effects are nonexistent or statistical artifacts. Given the present data and particularistic sample, however, the authors have insufficient information from which to draw such a conclusion.

  20. Selective IgA Deficiency and Common Variable Immunodeficiency

    Directory of Open Access Journals (Sweden)

    Kadri Kamber

    2009-09-01

    Full Text Available Selective IgA deficiency (sIgAD, using 5 mg/dl of serum IgA as the upper limit for diagnosis and concomitant lack of secretory IgA, is the most common form of primary immunodeficiency. The pathogenesis of IgA deficiency is not known, although abnormalities in Ig class switching and the cytokines involved in isotype switching have been implicated. Common Variable Immunodeficiency (CVID is a heterogenous group of B cell deficiency syndromes characterized by hypogammaglobulinemia, impaired antibody production and recurrent bacterial infections. Defective T-cell activation may lead to an impairment in cognate T-B-cell interaction due to impaired expression of CD40 ligand and/or abnormalities in the production of T-cell-derived cytokines required for fully functional B-cell activation, proliferation and/or differentiation which could indeed explain the impairment in antibody production present in CVID patients. It has been found that cytokines are produced in low levels due to the decreased T cell function which occurs as a result of the defect in CD40L expression in CVID patients. (Journal of Current Pediatrics 2009; 7: 90-5

  1. Optimality of Graphlet Screening in High Dimensional Variable Selection

    CERN Document Server

    Jin, Jiashun; Zhang, Qi

    2012-01-01

    Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of GS is to use GOSD to guide both the screening and cleaning. Compared to m-variate brute-forth screening that has a computational...

  2. Variable Selection for Generalized Varying Coefficient Partially Linear Models with Diverging Number of Parameters

    Institute of Scientific and Technical Information of China (English)

    Zheng-yan Lin; Yu-ze Yuan

    2012-01-01

    Semiparametric models with diverging number of predictors arise in many contemporary scientific areas. Variable selection for these models consists of two components: model selection for non-parametric components and selection of significant variables for the parametric portion.In this paper,we consider a variable selection procedure by combining basis function approximation with SCAD penalty.The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components.With appropriate selection of tuning parameters,we establish the consistency and sparseness of this procedure.

  3. Chemometrics in Fingerprinting by Means of Thin Layer Chromatography

    Directory of Open Access Journals (Sweden)

    Łukasz Komsta

    2012-01-01

    Full Text Available The paper is written as an introductory review, presenting summary of current knowledge about chemometric fingerprinting in the context of TLC, due to a rather small interest in the literature about joining TLC and chemometrics. The paper shortly covers the most important aspects of the chemometric fingerprinting in general, creating the TLC fingerprints, denoising, baseline removal, warping/registering, and chemometric processing itself. References being good candidates as a starting point are given for each topic and processing step.

  4. Variable Selection for Varying-Coefficient Models with Missing Response at Random

    Institute of Scientific and Technical Information of China (English)

    Pei Xin ZHAO; Liu Gen XUE

    2011-01-01

    In this paper, we present a variable selection procedure by combining basis function approximations with penalized estimating equations for varying-coefficient models with missing response at random. With appropriate selection of the tuning parameters, we establish the consistency of the variable selection procedure and the optimal convergence rate of the regularized estimators. A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.

  5. Bhargava and Ishizuka's BI-Method: A Neglected Method for Variable Selection

    Science.gov (United States)

    Leung, Shing On; Sachs, John

    2005-01-01

    Quite often in data reduction, it is more meaningful and economical to select a subset of variables instead of reducing the dimensionality of the variable space with principal components analysis. The authors present a neglected method for variable selection called the BI-method (R. P. Bhargava & T. Ishizuka, 1981). It is a direct, simple method…

  6. Variable Selection for Semiparametric Varying-Coefficient Partially Linear Models with Missing Response at Random

    Institute of Scientific and Technical Information of China (English)

    Pei Xin ZHAO; Liu Gen XUE

    2011-01-01

    In this paper,we present a variable selection procedure by combining basis function approximations with penalized estimating equations for semiparametric varying-coefficient partially linear models with missing response at random.The proposed procedure simultaneously selects significant variables in parametric components and nonparametric components.With appropriate selection of the tuning parameters,we establish the consistency of the variable selection procedure and the convergence rate of the regularized estimators.A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.

  7. Advanced stability indicating chemometric methods for quantitation of amlodipine and atorvastatin in their quinary mixture with acidic degradation products

    Science.gov (United States)

    Darwish, Hany W.; Hassan, Said A.; Salem, Maissa Y.; El-Zeany, Badr A.

    2016-02-01

    Two advanced, accurate and precise chemometric methods are developed for the simultaneous determination of amlodipine besylate (AML) and atorvastatin calcium (ATV) in the presence of their acidic degradation products in tablet dosage forms. The first method was Partial Least Squares (PLS-1) and the second was Artificial Neural Networks (ANN). PLS was compared to ANN models with and without variable selection procedure (genetic algorithm (GA)). For proper analysis, a 5-factor 5-level experimental design was established resulting in 25 mixtures containing different ratios of the interfering species. Fifteen mixtures were used as calibration set and the other ten mixtures were used as validation set to validate the prediction ability of the suggested models. The proposed methods were successfully applied to the analysis of pharmaceutical tablets containing AML and ATV. The methods indicated the ability of the mentioned models to solve the highly overlapped spectra of the quinary mixture, yet using inexpensive and easy to handle instruments like the UV-VIS spectrophotometer.

  8. A chemometric approach to characterization of ionic liquids for gas chromatography.

    Science.gov (United States)

    González-Álvarez, Jaime; Mangas-Alonso, Juan José; Arias-Abrodo, Pilar; Gutiérrez-Álvarez, María Dolores

    2014-05-01

    A chemometric study was carried out to characterize three ionic liquid types (ILs) with hexacationic imidazolium, polymeric imidazolium, and phosphonium cationic cores, using a range of contra-anions such as halogens, thiocyanate, boron anions, triflate, and bistriflimide. The solvation parameter model developed by Abraham et al., unsupervised techniques as cluster analysis (CA), and supervised techniques as linear discriminant analysis (LDA), step-LDA, quadratic discriminant analysis (QDA), and multivariate regression techniques as discriminant partial least squares (D-PLS), or multiple linear regression (MLR) were used to characterize the functionalized ILs above. CA established two main groups of phases, those with an acidic H-bond and those with basic ones. Once detected, the two natural groups, a linear and quadratic delimiters with good classification (>96 %) and prediction (>92 %) capacities were computed. The use of step-LDA technique allowed us to establish that a, b, and s solvation parameters were the most discriminant variables. These variables were used for modeling purposes, and a D-PLS and MLR models were constructed using a binary response. The explained variance of categorical variable by the model validated by cross-validation was 65 %, and 94.5 % of ILs were correctly predicted. IL characterization carried out would allow the appropriate selection of phases for gas chromatography (GC).

  9. Variability survey of brightest stars in selected OB associations

    CERN Document Server

    Laur, Jaan; Eenmäe, Tõnis; Tuvikene, Taavi; Leedjärv, Laurits

    2016-01-01

    The stellar evolution theory of massive stars remains uncalibrated with high-precision photometric observational data mainly due to a small number of luminous stars that are monitored from space. Automated all-sky surveys have revealed numerous variable stars but most of the luminous stars are often overexposed. Targeted campaigns can improve the time base of photometric data for those objects. The aim of this investigation is to study the variability of luminous stars at different timescales in young open clusters and OB associations. We monitored 22 open clusters and associations from 2011 to 2013 using a 0.25-m telescope. Variable stars were detected by comparing the overall light-curve scatter with measurement uncertainties. Variability was analysed by the light curve feature extraction tool FATS. Periods of pulsating stars were determined using the discrete Fourier transform code SigSpec. We then classified the variable stars based on their pulsation periods and available spectral information. We obtaine...

  10. Variability survey of brightest stars in selected OB associations

    Science.gov (United States)

    Laur, Jaan; Kolka, Indrek; Eenmäe, Tõnis; Tuvikene, Taavi; Leedjärv, Laurits

    2017-02-01

    Context. The stellar evolution theory of massive stars remains uncalibrated with high-precision photometric observational data mainly due to a small number of luminous stars that are monitored from space. Automated all-sky surveys have revealed numerous variable stars but most of the luminous stars are often overexposed. Targeted campaigns can improve the time base of photometric data for those objects. Aims: The aim of this investigation is to study the variability of luminous stars at different timescales in young open clusters and OB associations. Methods: We monitored 22 open clusters and associations from 2011 to 2013 using a 0.25-m telescope. Variable stars were detected by comparing the overall light-curve scatter with measurement uncertainties. Variability was analysed by the light curve feature extraction tool FATS. Periods of pulsating stars were determined using the discrete Fourier transform code SigSpec. We then classified the variable stars based on their pulsation periods and available spectral information. Results: We obtained light curves for more than 20 000 sources of which 354 were found to be variable. Amongst them we find 80 eclipsing binaries, 31 α Cyg, 13 β Cep, 62 Be, 16 slowly pulsating B, 7 Cepheid, 1 γ Doradus, 3 Wolf-Rayet and 63 late-type variable stars. Up to 55% of these stars are potential new discoveries as they are not present in the Variable Star Index (VSX) database. We find the cluster membership fraction for variable stars to be 13% with an upper limit of 35%. Variable star catalogue (Tables A.1-A.10) and light curves are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A108

  11. A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis

    Science.gov (United States)

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these…

  12. Natural selection. I. Variable environments and uncertain returns on investment.

    Science.gov (United States)

    Frank, S A

    2011-11-01

    Many studies have analysed how variability in reproductive success affects fitness. However, each study tends to focus on a particular problem, leaving unclear the overall structure of variability in populations. This fractured conceptual framework often causes particular applications to be incomplete or improperly analysed. In this article, I present a concise introduction to the two key aspects of the theory. First, all measures of fitness ultimately arise from the relative comparison of the reproductive success of individuals or genotypes with the average reproductive success in the population. That relative measure creates a diminishing relation between reproductive success and fitness. Diminishing returns reduce fitness in proportion to variability in reproductive success. The relative measurement of success also induces a frequency dependence that favours rare types. Second, variability in populations has a hierarchical structure. Variable success in different traits of an individual affects that individual's variation in reproduction. Correlation between different individuals' reproduction affects variation in the aggregate success of particular alleles across the population. One must consider the hierarchical structure of variability in relation to different consequences of temporal, spatial and developmental variability. Although a complete analysis of variability has many separate parts, this simple framework allows one to see the structure of the whole and to place particular problems in their proper relation to the general theory. The biological understanding of relative success and the hierarchical structure of variability in populations may also contribute to a deeper economic theory of returns under uncertainty. © 2011 The Author. Journal of Evolutionary Biology © 2011 European Society For Evolutionary Biology.

  13. In-Depth Two-Year Study of Phenolic Profile Variability among Olive Oils from Autochthonous and Mediterranean Varieties in Morocco, as Revealed by a LC-MS Chemometric Profiling Approach

    Science.gov (United States)

    Bajoub, Aadil; Medina-Rodríguez, Santiago; Olmo-García, Lucía; Ajal, El Amine; Monasterio, Romina P.; Hanine, Hafida; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría

    2016-01-01

    Olive oil phenolic fraction considerably contributes to the sensory quality and nutritional value of this foodstuff. Herein, the phenolic fraction of 203 olive oil samples extracted from fruits of four autochthonous Moroccan cultivars (“Picholine Marocaine”, “Dahbia”, “Haouzia” and “Menara”), and nine Mediterranean varieties recently introduced in Morocco (“Arbequina”, “Arbosana”, “Cornicabra”, “Frantoio”, “Hojiblanca”, “Koroneiki”, “Manzanilla”, “Picholine de Languedoc” and “Picual”), were explored over two consecutive crop seasons (2012/2013 and 2013/2014) by using liquid chromatography-mass spectrometry. A total of 32 phenolic compounds (and quinic acid), belonging to five chemical classes (secoiridoids, simple phenols, flavonoids, lignans and phenolic acids) were identified and quantified. Phenolic profiling revealed that the determined phenolic compounds showed variety-dependent levels, being, at the same time, significantly affected by the crop season. Moreover, based on the obtained phenolic composition and chemometric linear discriminant analysis, statistical models were obtained allowing a very satisfactory classification and prediction of the varietal origin of the studied oils. PMID:28036024

  14. In-Depth Two-Year Study of Phenolic Profile Variability among Olive Oils from Autochthonous and Mediterranean Varieties in Morocco, as Revealed by a LC-MS Chemometric Profiling Approach

    Directory of Open Access Journals (Sweden)

    Aadil Bajoub

    2016-12-01

    Full Text Available Olive oil phenolic fraction considerably contributes to the sensory quality and nutritional value of this foodstuff. Herein, the phenolic fraction of 203 olive oil samples extracted from fruits of four autochthonous Moroccan cultivars (“Picholine Marocaine”, “Dahbia”, “Haouzia” and “Menara”, and nine Mediterranean varieties recently introduced in Morocco (“Arbequina”, “Arbosana”, “Cornicabra”, “Frantoio”, “Hojiblanca”, “Koroneiki”, “Manzanilla”, “Picholine de Languedoc” and “Picual”, were explored over two consecutive crop seasons (2012/2013 and 2013/2014 by using liquid chromatography-mass spectrometry. A total of 32 phenolic compounds (and quinic acid, belonging to five chemical classes (secoiridoids, simple phenols, flavonoids, lignans and phenolic acids were identified and quantified. Phenolic profiling revealed that the determined phenolic compounds showed variety-dependent levels, being, at the same time, significantly affected by the crop season. Moreover, based on the obtained phenolic composition and chemometric linear discriminant analysis, statistical models were obtained allowing a very satisfactory classification and prediction of the varietal origin of the studied oils.

  15. Selecting candidate predictor variables for the modelling of post ...

    African Journals Online (AJOL)

    more formal methods such as focus group discussions, questionnaires and ... statistics, methodology, epidemiology, computer engineering and infectious dis- eases. .... ed on their lack of knowledge of wealth scoring tools. Variables exhibiting ...

  16. Chemometrics applications in biotech processes: a review.

    Science.gov (United States)

    Rathore, Anurag S; Bhushan, Nitish; Hadpe, Sandip

    2011-01-01

    Biotech unit operations are often characterized by a large number of inputs (operational parameters) and outputs (performance parameters) along with complex correlations amongst them. A typical biotech process starts with the vial of the cell bank, ends with the final product, and has anywhere from 15 to 30 such unit operations in series. The aforementioned parameters can impact process performance and product quality and also interact amongst each other. Chemometrics presents one effective approach to gather process understanding from such complex data sets. The increasing use of chemometrics is fuelled by the gradual acceptance of quality by design and process analytical technology amongst the regulators and the biotech industry, which require enhanced process and product understanding. In this article, we review the topic of chemometrics applications in biotech processes with a special focus on recent major developments. Case studies have been used to highlight some of the significant applications.

  17. Estimation of raw material performance in mammalian cell culture using near infrared spectra combined with chemometrics approaches.

    Science.gov (United States)

    Lee, Hae Woo; Christie, Andrew; Liu, Jun Jay; Yoon, Seongkyu

    2012-01-01

    Understanding variability in raw materials and their impacts on product quality is of critical importance in the biopharmaceutical manufacturing processes. For this purpose, several spectroscopic techniques have been studied for raw material characterization, providing fast and nondestructive ways to measure quality of raw materials. However, investigations of correlation between spectra of raw materials and cell culture performance have been scarce due to their complexity and uncertainty. In this study, near-infrared spectra and bioassays of multiple soy hydrolysate lots manufactured by different vendors were analyzed using chemometrics approaches in order to address variability of raw materials as well as correlation between raw material properties and corresponding cell culture performance. Principal component analysis revealed that near-infrared spectra of different soy lots contain enough physicochemical information about soy hydrolysates to allow identification of lot-to-lot variability as well as vendor-to-vendor differences. The identified compositional variability was further analyzed in order to estimate cell growth and protein production of two mammalian cell lines under the condition of varying soy dosages using partial least square regression combined with optimal variable selection. The performance of the resulting models demonstrates the potential of near-infrared spectroscopy as a robust lot selection tool for raw materials while providing a biological link between chemical composition of raw materials and cell culture performance.

  18. Level of Caregiver Burden in Jamaican Stroke Caregivers and Relationship between Selected Sociodemographic Variables

    National Research Council Canada - National Science Library

    Roopchand-Martin, S; Creary-Yan, S

    2014-01-01

    This study sought to determine the level of caregiver burden present in Jamaican stroke caregivers and to investigate the relationship between caregiver burden and selected sociodemographic variables...

  19. Does chemometrics enhance the performance of electroanalysis?

    Science.gov (United States)

    Ni, Yongnian; Kokot, Serge

    2008-09-26

    This review explores the question whether chemometrics methods enhance the performance of electroanalytical methods. Electroanalysis has long benefited from the well-established techniques such as potentiometric titrations, polarography and voltammetry, and the more novel ones such as electronic tongues and noses, which have enlarged the scope of applications. The electroanalytical methods have been improved with the application of chemometrics for simultaneous quantitative prediction of analytes or qualitative resolution of complex overlapping responses. Typical methods include partial least squares (PLS), artificial neural networks (ANNs), and multiple curve resolution methods (MCR-ALS, N-PLS and PARAFAC). This review aims to provide the practising analyst with a broad guide to electroanalytical applications supported by chemometrics. In this context, after a general consideration of the use of a number of electroanalytical techniques with the aid of chemometrics methods, several overviews follow with each one focusing on an important field of application such as food, pharmaceuticals, pesticides and the environment. The growth of chemometrics in conjunction with electronic tongue and nose sensors is highlighted, and this is followed by an overview of the use of chemometrics for the resolution of complicated profiles for qualitative identification of analytes, especially with the use of the MCR-ALS methodology. Finally, the performance of electroanalytical methods is compared with that of some spectrophotometric procedures on the basis of figures-of-merit. This showed that electroanalytical methods can perform as well as the spectrophotometric ones. PLS-1 appears to be the method of practical choice if the %relative prediction error of approximately +/-10% is acceptable.

  20. Does chemometrics enhance the performance of electroanalysis?

    Energy Technology Data Exchange (ETDEWEB)

    Ni Yongnian [State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, Jiangxi 330047 (China); Department of Chemistry, Nanchang University, Nanchang, Jiangxi 330047 (China)], E-mail: ynni@ncu.edu.cn; Kokot, Serge [Inorganic Materials Research Program, School of Physical and Chemical Sciences, Queensland University of Technology, Brisbane, Queensland 4001 (Australia)

    2008-09-26

    This review explores the question whether chemometrics methods enhance the performance of electroanalytical methods. Electroanalysis has long benefited from the well-established techniques such as potentiometric titrations, polarography and voltammetry, and the more novel ones such as electronic tongues and noses, which have enlarged the scope of applications. The electroanalytical methods have been improved with the application of chemometrics for simultaneous quantitative prediction of analytes or qualitative resolution of complex overlapping responses. Typical methods include partial least squares (PLS), artificial neural networks (ANNs), and multiple curve resolution methods (MCR-ALS, N-PLS and PARAFAC). This review aims to provide the practising analyst with a broad guide to electroanalytical applications supported by chemometrics. In this context, after a general consideration of the use of a number of electroanalytical techniques with the aid of chemometrics methods, several overviews follow with each one focusing on an important field of application such as food, pharmaceuticals, pesticides and the environment. The growth of chemometrics in conjunction with electronic tongue and nose sensors is highlighted, and this is followed by an overview of the use of chemometrics for the resolution of complicated profiles for qualitative identification of analytes, especially with the use of the MCR-ALS methodology. Finally, the performance of electroanalytical methods is compared with that of some spectrophotometric procedures on the basis of figures-of-merit. This showed that electroanalytical methods can perform as well as the spectrophotometric ones. PLS-1 appears to be the method of practical choice if the %relative prediction error of {approx}{+-}10% is acceptable.

  1. Fluctuating selection: the perpetual renewal of adaptation in variable environments.

    Science.gov (United States)

    Bell, Graham

    2010-01-12

    Darwin insisted that evolutionary change occurs very slowly over long periods of time, and this gradualist view was accepted by his supporters and incorporated into the infinitesimal model of quantitative genetics developed by R. A. Fisher and others. It dominated the first century of evolutionary biology, but has been challenged in more recent years both by field surveys demonstrating strong selection in natural populations and by quantitative trait loci and genomic studies, indicating that adaptation is often attributable to mutations in a few genes. The prevalence of strong selection seems inconsistent, however, with the high heritability often observed in natural populations, and with the claim that the amount of morphological change in contemporary and fossil lineages is independent of elapsed time. I argue that these discrepancies are resolved by realistic accounts of environmental and evolutionary changes. First, the physical and biotic environment varies on all time-scales, leading to an indefinite increase in environmental variance over time. Secondly, the intensity and direction of natural selection are also likely to fluctuate over time, leading to an indefinite increase in phenotypic variance in any given evolving lineage. Finally, detailed long-term studies of selection in natural populations demonstrate that selection often changes in direction. I conclude that the traditional gradualist scheme of weak selection acting on polygenic variation should be supplemented by the view that adaptation is often based on oligogenic variation exposed to commonplace, strong, fluctuating natural selection.

  2. Chemometric modelling based on 2D-fluorescence spectra without a calibration measurement.

    Science.gov (United States)

    Solle, D; Geissler, D; Stärk, E; Scheper, T; Hitzmann, B

    2003-01-22

    2D fluorescence spectra provide information from intracellular compounds. Fluorophores like trytophan, tyrosine and phenylalanin as well as NADH and flavins make the corresponding measurement systems very important for bioprocess supervision and control. The evaluation is usually based on chemometric modelling using for their calibration procedure off-line measurements of the desired process variables. Due to the data driven approach lots of off-line measurements are required. Here a methodology is presented, which enables to perform a calibration procedure of chemometric models without any further measurement. The necessary information for the calibration procedure is provided by means of the a priori knowledge about the process, i.e. a mathematical model, whose model parameters are estimated during the calibration procedure, as well as the fact that the substrate should be consumed at the end of the process run. The new methodology for chemometric calibration is applied for a batch cultivation of aerobically grown S. cerevisiae on the glucose Schatzmann medium. As will be presented the chemometric models, which are determined by this method, can be used for prediction during new process runs. The MATHLAB routine is free available on request from the authors.

  3. Cross Validation of Selection of Variables in Multiple Regression.

    Science.gov (United States)

    1979-12-01

    Bomber IBMNAV * BOMNAV Navigation-Cargo * * CARNAV Sensory-Fighter * SF FGTSEN Sensory - Bomber * SB BOMSEN Communication - Fighter IFGCOM CF FGTCOM...of Variables Variable No. Recode FGTNAV 1 0 LESS THAN 1 1 OR OVER BONNAV 2 0 LESS THAN S1 OR OVER CARNAV 3 0 LESS THAN S1 OR OVER FGTSEN 4 0 LESS THAN...cc x x x x x x x CARNAV X X X X X X x XMTR x X X X X x PD X x X X X X UP x- *Those which AID determined. 44 This value was lowered to 3 in the

  4. IMMAN: free software for information theory-based chemometric analysis.

    Science.gov (United States)

    Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo

    2015-05-01

    The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA

  5. Variable selection in multiple linear regression: The influence of ...

    African Journals Online (AJOL)

    Akaike's information criterion, influential data cases, Mallows' Cp criterion, multiple ... In this paper we introduce two new measures of the selection influence of an ..... [1] Akaike H, 1973, Information theory and an extension of the maximum ...

  6. Using of laser spectroscopy and chemometrics methods for identification of patients with lung cancer, patients with COPD and healthy people from absorption spectra of exhaled air

    Science.gov (United States)

    Bukreeva, Ekaterina B.; Bulanova, Anna A.; Kistenev, Yury V.; Kuzmin, Dmitry A.; Nikiforova, Olga Yu.; Ponomarev, Yurii N.; Tuzikov, Sergei A.; Yumov, Evgeny L.

    2014-11-01

    The results of application of the joint use of laser photoacoustic spectroscopy and chemometrics methods in gas analysis of exhaled air of patients with chronic respiratory diseases (chronic obstructive pulmonary disease and lung cancer) are presented. The absorption spectra of exhaled breath of representatives of the target groups and healthy volunteers were measured; the selection by chemometrics methods of the most informative absorption coefficients in scan spectra in terms of the separation investigated nosology was implemented.

  7. Recursive Feature Selection with Significant Variables of Support Vectors

    Directory of Open Access Journals (Sweden)

    Chen-An Tsai

    2012-01-01

    Full Text Available The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE and recursive support vector machine (RSVM. The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  8. Simultaneous spectrophotometric determination of paracetamol, ibuprofen and caffeine in pharmaceuticals by chemometric methods

    Science.gov (United States)

    Khoshayand, M. R.; Abdollahi, H.; Shariatpanahi, M.; Saadatfard, A.; Mohammadi, A.

    2008-08-01

    In this study, the simultaneous determination of paracetamol, ibuprofen and caffeine in pharmaceuticals by chemometric approaches using UV spectrophotometry has been reported as a simple alternative to using separate models for each component. Spectra of paracetamol, ibuprofen and caffeine were recorded at several concentrations within their linear ranges and were used to compute the calibration mixture between wavelengths 200 and 400 nm at an interval of 1 nm in methanol:0.1 HCl (3:1). Partial least squares regression (PLS), genetic algorithm coupled with PLS (GA-PLS), and principal component-artificial neural network (PC-ANN) were used for chemometric analysis of data and the parameters of the chemometric procedures were optimized. The analytical performances of these chemometric methods were characterized by relative prediction errors and recoveries (%) and were compared with each other. The GA-PLS shows superiority over other applied multivariate methods due to the wavelength selection in PLS calibration using a genetic algorithm without loss of prediction capacity. Although the components show an important degree of spectral overlap, they have been determined simultaneously and rapidly requiring no separation step. These three methods were successfully applied to pharmaceutical formulation, capsule, with no interference from excipients as indicated by the recovery study results. The proposed methods are simple and rapid and can be easily used in the quality control of drugs as alternative analysis tools.

  9. Anthropogenic environments exert variable selection on cranial capacity in mammals.

    Science.gov (United States)

    Snell-Rood, Emilie C; Wick, Naomi

    2013-10-22

    It is thought that behaviourally flexible species will be able to cope with novel and rapidly changing environments associated with human activity. However, it is unclear whether such environments are selecting for increases in behavioural plasticity, and whether some species show more pronounced evolutionary changes in plasticity. To test whether anthropogenic environments are selecting for increased behavioural plasticity within species, we measured variation in relative cranial capacity over time and space in 10 species of mammals. We predicted that urban populations would show greater cranial capacity than rural populations and that cranial capacity would increase over time in urban populations. Based on relevant theory, we also predicted that species capable of rapid population growth would show more pronounced evolutionary responses. We found that urban populations of two small mammal species had significantly greater cranial capacity than rural populations. In addition, species with higher fecundity showed more pronounced differentiation between urban and rural populations. Contrary to expectations, we found no increases in cranial capacity over time in urban populations-indeed, two species tended to have a decrease in cranial capacity over time in urban populations. Furthermore, rural populations of all insectivorous species measured showed significant increases in relative cranial capacity over time. Our results provide partial support for the hypothesis that urban environments select for increased behavioural plasticity, although this selection may be most pronounced early during the urban colonization process. Furthermore, these data also suggest that behavioural plasticity may be simultaneously favoured in rural environments, which are also changing because of human activity.

  10. Implementations of tests on the exogeneity of selected variables and their performance in practice

    NARCIS (Netherlands)

    Pleus, M.

    2015-01-01

    In order to consistently estimate a causal economic relationship at least as many exogenous non-explanatory instrumental variables are required as there are endogenous explanatory variables. This thesis studies various techniques that can be used to classify selected variables as either exogenous or

  11. VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.

  12. Variable environmental effects on a multicomponent sexually selected trait.

    Science.gov (United States)

    Cole, Gemma L; Endler, John A

    2015-04-01

    Multicomponent signals are made up of interacting elements that generate a functional signaling unit. The interactions between signal components and their effects on individual fitness are not well understood, and the effect of environment is even less so. It is usually assumed that color patterns appear the same in all light environments and that the effects of each color are additive. Using guppies, Poecilia reticulata, we investigated the effect of water color on the interactions between components of sexually selected male coloration. Through behavioral mate choice trials in four different water colors, we estimated the attractiveness of male color patterns, using multivariate fitness estimates and overall signal contrast. Our results show that females exhibit preferences that favor groups of colors rather than individual colors independently and that each environment favors different color combinations. We found that these effects are consistent with female guppies selecting entire color patterns on the basis of overall visual contrast. This suggests that both individuals and populations inhabiting different light environments will be subject to divergent, multivariate selection. Although the appearance of color patterns changes with light environment, achromatic components change little, suggesting that these could function in species recognition or other aspects of communication that must work across environments. Consequently, we predict different phylogenetic patterns between chromatic and achromatic signals within the same clades.

  13. Iron-induced oxidative stress in a macrophyte: a chemometric approach.

    Science.gov (United States)

    Sinha, Sarita; Basant, Ankita; Malik, Amrita; Singh, Kunwar P

    2009-02-01

    Iron-induced oxidative stress in plants of Bacopa monnieri L., a macrophyte with medicinal value, was investigated using the chemometric approach. Cluster analysis (CA) rendered two distinct clusters of roots and shoots. Discriminant analysis (DA) identified discriminating variables (NP-SH and APX) between the root and shoot tissues. Principal component analysis (PCA) results suggested that protein, superoxide dismutase (SOD), ascorbic acid, proline, and Fe uptake are dominant in root tissues, whereas malondialdehyde (MDA), guaiacol peroxidase (POD), cysteine, and non-protein thiol (NP-SH) in shoot of the stress plant. Discriminant partial-least squares (DPLS) results further confirmed that SOD and ascorbic acid contents dominated in root tissues, while NP-SH, cysteine, POD, ascorbate peroxidase (APX), and MDA in shoot. MDA and NP-SH were identified as most pronounced variables in plant during the highest exposure time. The chemometric approach allowed for the interpretation of the induced biochemical changes in plant tissues exposed to iron.

  14. Thresholded Lasso for high dimensional variable selection and statistical estimation

    CERN Document Server

    Zhou, Shuheng

    2010-01-01

    Given $n$ noisy samples with $p$ dimensions, where $n \\ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\\it Thresholded Lasso}, can accurately estimate a sparse vector $\\beta \\in \\R^p$ in a linear model $Y = X \\beta + \\epsilon$, where $X$ is an $n \\times p$ design matrix, and $\\epsilon \\sim N(0, \\sigma^2 I_n)$. We show that under the restricted eigenvalue (RE) condition (Bickel-Ritov-Tsybakov 09), it is possible to achieve the $\\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an {\\em oracle} while selecting a sufficiently sparse model -- hence achieving {\\it sparse oracle inequalities}; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. In some sense, the Thresholded Lasso recovers the choices that would have been made by the $\\ell_0$ penalized least squares estimators, in that it selects a sufficiently sparse model without sacrificing the accuracy in ...

  15. Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications

    Science.gov (United States)

    Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

    2016-01-01

    Background. Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors. PMID:28231172

  16. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

    Science.gov (United States)

    Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H

    2017-07-01

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in

  17. Variability-based Active Galactic Nucleus Selection Using Image Subtraction in the SDSS and LSST Era

    Science.gov (United States)

    Choi, Yumi; Gibson, Robert R.; Becker, Andrew C.; Ivezić, Željko; Connolly, Andrew J.; MacLeod, Chelsea L.; Ruan, John J.; Anderson, Scott F.

    2014-02-01

    With upcoming all-sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based active galactic nucleus (AGN) selection will enable the construction of highly complete catalogs with minimum contamination. In this study, we generate g-band difference images and construct light curves (LCs) for QSO/AGN candidates listed in Sloan Digital Sky Survey Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image LCs, we explore several LC statistics and parameterize optical variability by the characteristic damping timescale (τ) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select them with high completeness of 93.4% and efficiency (i.e., purity) of 71.3%. Based on optical variability, we also select highly variable blazar candidates, whose infrared colors are consistent with known blazars. One-third of them are also radio detected. With the X-ray selected AGN candidates, we probe the optical variability of X-ray detected optically extended sources using their difference image LCs for the first time. A combination of optical variability and X-ray detection enables us to select various types of host-dominated AGNs. Contrary to the AGN unification model prediction, two Type II AGN candidates (out of six) show detectable variability on long-term timescales like typical Type I AGNs. This study will provide a baseline for future optical variability studies of extended sources.

  18. Variable selection for multiply-imputed data with application to dioxin exposure study.

    Science.gov (United States)

    Chen, Qixuan; Wang, Sijian

    2013-09-20

    Multiple imputation (MI) is a commonly used technique for handling missing data in large-scale medical and public health studies. However, variable selection on multiply-imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes it difficult to interpret the final model or draw scientific conclusions. In this paper, we propose a novel multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) variable selection method as an extension of the least absolute shrinkage and selection operator (LASSO) method to multiply-imputed data. The MI-LASSO method treats the estimated regression coefficients of the same variable across all imputed datasets as a group and applies the group LASSO penalty to yield a consistent variable selection across multiple-imputed datasets. We use a simulation study to demonstrate the advantage of the MI-LASSO method compared with the alternatives. We also apply the MI-LASSO method to the University of Michigan Dioxin Exposure Study to identify important circumstances and exposure factors that are associated with human serum dioxin concentration in Midland, Michigan.

  19. Identification of the minimum effective dose for normally distributed data using a Bayesian variable selection approach.

    Science.gov (United States)

    Otava, Martin; Shkedy, Ziv; Hothorn, Ludwig A; Talloen, Willem; Gerhard, Daniel; Kasim, Adetayo

    2017-02-16

    The identification of the minimum effective dose is of high importance in the drug development process. In early stage screening experiments, establishing the minimum effective dose can be translated into a model selection based on information criteria. The presented alternative, Bayesian variable selection approach, allows for selection of the minimum effective dose, while taking into account model uncertainty. The performance of Bayesian variable selection is compared with the generalized order restricted information criterion on two dose-response experiments and through the simulations study. Which method has performed better depends on the complexity of the underlying model and the effect size relative to noise.

  20. Chemometric evaluation of the anti-cancer pro-drug podophyllotoxin and potential therapeutic analogues in Juniperus and Podophyllum species.

    Science.gov (United States)

    Kusari, Souvik; Zühlke, Sebastian; Spiteller, Michael

    2011-01-01

    Podophyllotoxin, deoxypodophyllotoxin, demethylpodophyllotoxin and podophyllotoxone are four therapeutically potent secondary metabolites. There is a dearth of information on the holistic analysis of their distribution pattern in both phylogenetic and ecological contexts. To analyse the continuum of the above metabolites in Juniperus and Podophyllum species collected from natural populations in Himalayan environments and the botanical gardens of Rombergpark and Haltern (Germany) using multi-component LC-ESI-MS/MS, coupled with statistically relevant chemometric assessment. We evaluated the individual and holistic metabolite profiles and chemometrically correlated the phytochemical loads between various species (infraspecific), organic and aqueous extracts, and populations of the same species from different locations, different species from same location, different species from different locations and infrageneric populations from same and different locations. Multivariate analysis revealed Juniperus x-media Pfitzeriana as a suitable alternative to Podophyllum hexandrum for commercial exploitation. A significant positive correlation of podophyllotoxone with both podophyllotoxin and demethylpodophyllotoxin, and a negative correlation of podophyllotoxin with both deoxypodophyllotoxin and demethylpodophyllotoxin (infraspecific among Podophyllum), were observed by Kruskal's multidimensional scaling and corroborated by principal component analysis, indicating probable similarity and/or difference between the biosynthetic pathways, and synergistic and/or antagonistic principles, respectively. Finally, linear discriminant analysis and hierarchical agglomerative cluster analysis revealed considerable infrageneric and infraspecific variability in secondary compound spectra and load of the different populations under study. Such holistic studies of plants and their therapeutic metabolites ought to assist in selecting plants, geographical areas and environmental conditions for

  1. Enhancing prediction power of chemometric models through manipulation of the fed spectrophotometric data: A comparative study

    Science.gov (United States)

    Saad, Ahmed S.; Hamdy, Abdallah M.; Salama, Fathy M.; Abdelkawy, Mohamed

    2016-10-01

    Effect of data manipulation in preprocessing step proceeding construction of chemometric models was assessed. The same set of UV spectral data was used for construction of PLS and PCR models directly and after mathematically manipulation as per well known first and second derivatives of the absorption spectra, ratio spectra and first and second derivatives of the ratio spectra spectrophotometric methods, meanwhile the optimal working wavelength ranges were carefully selected for each model and the models were constructed. Unexpectedly, number of latent variables used for models' construction varied among the different methods. The prediction power of the different models was compared using a validation set of 8 mixtures prepared as per the multilevel multifactor design and results were statistically compared using two-way ANOVA test. Root mean squares error of prediction (RMSEP) was used for further comparison of the predictability among different constructed models. Although no significant difference was found between results obtained using Partial Least Squares (PLS) and Principal Component Regression (PCR) models, however, discrepancies among results was found to be attributed to the variation in the discrimination power of adopted spectrophotometric methods on spectral data.

  2. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    Science.gov (United States)

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  3. Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

    CERN Document Server

    Bouveyron, Charles

    2012-01-01

    The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through $\\ell_{1}$-type penalizations. Experimental comparisons with existing approach...

  4. Variable selection in multiple linear regression: The influence of individual cases

    Directory of Open Access Journals (Sweden)

    SJ Steel

    2007-12-01

    Full Text Available The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calculated. It is argued that the selection procedure may be improved by taking the selection influence of individual data cases into account.

  5. Chemometrics and modernization of traditional Chinese medicine

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Development of chromatographic fingerprinting and its related chemometric methods in the research of quality control of traditional Chinese medicines(TCMs) are discussed. The quality control methods for guarantying the authentication and stability of products and semi-products of TCMs are firstly assessed. The technique based on chromatographic fingerprinting is essentially a kind of high-through put and integral tools to explore the complexity of herbal medicines. In order to further control the comprehensive quality of TCMs,confirmation and identification of their important chemical components are necessary. Some new strategies are proposed to trace the chemical changes of chromatographic fingerprints both in product processing and/or after their administration by modern chromatographic techniques and chemometrics. Combined with systems biology and bioinformatics,it seems possible for one to reveal the working mechanism of TCMs and to further control their intrinsic quality comprehensively.

  6. Functional Data Analysis Applied in Chemometrics

    DEFF Research Database (Denmark)

    Muller, Martha

    the worlds of statistics and chemometrics. We want to provide a glimpse of the essential and complex data pre-processing that is well known to chemometricians, but is generally unknown to statisticians. Pre-processing can potentially have a strong in uence on the results of consequent data analysis. Our......In this thesis we explore the use of functional data analysis as a method to analyse chemometric data, more specically spectral data in metabolomics. Functional data analysis is a vibrant eld in statistics. It has been rapidly expanding in both methodology and applications since it was made well...... known by Ramsay & Silverman's monograph in 1997. In functional data analysis, the data are curves instead of data points. Each curve is measured at discrete points along a continuum, for example, time or frequency. It is assumed that the underlying process generating the curves is smooth...

  7. Variability-based AGN selection using image subtraction in the SDSS and LSST era

    CERN Document Server

    Choi, Yumi; Becker, Andrew C; Ivezić, \\vZeljko; Connolly, Andrew J; MacLeod, Chelsea L; Ruan, John J; Anderson, Scott F

    2013-01-01

    With upcoming all sky surveys such as LSST poised to generate a deep digital movie of the optical sky, variability-based AGN selection will enable the construction of highly-complete catalogs with minimum contamination. In this study, we generate $g$-band difference images and construct light curves for QSO/AGN candidates listed in SDSS Stripe 82 public catalogs compiled from different methods, including spectroscopy, optical colors, variability, and X-ray detection. Image differencing excels at identifying variable sources embedded in complex or blended emission regions such as Type II AGNs and other low-luminosity AGNs that may be omitted from traditional photometric or spectroscopic catalogs. To separate QSOs/AGNs from other sources using our difference image light curves, we explore several light curve statistics and parameterize optical variability by the characteristic damping timescale ($\\tau$) and variability amplitude. By virtue of distinguishable variability parameters of AGNs, we are able to select...

  8. Variable selection in identification of a high dimensional nonlinear non-parametric system

    Institute of Scientific and Technical Information of China (English)

    Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG

    2015-01-01

    The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.

  9. Best conditions for biodegradation of diesel oil by chemometric tools

    Directory of Open Access Journals (Sweden)

    Ewa Kaczorek

    2014-01-01

    Full Text Available Diesel oil biodegradation by different bacteria-yeast-rhamnolipids consortia was tested. Chromatographic analysis of post-biodegradation residue was completed with chemometric tools (ANOVA, and a novel ranking procedure based on the sum of ranking differences. These tools were used in the selection of the most effective systems. The best results of aliphatic fractions of diesel oil biodegradation were observed for a yeast consortia with Aeromonas hydrophila KR4. For these systems the positive effect of rhamnolipids on hydrocarbon biodegradation was observed. However, rhamnolipids addition did not always have a positive influence on the biodegradation process (e.g. in case of yeast consortia with Stenotrophomonas maltophila KR7. Moreover, particular differences in the degradation pattern were observed for lower and higher alkanes than in the case with C22. Normally, the best conditions for "lower" alkanes are Aeromonas hydrophila KR4 + emulsifier independently from yeasts and e.g. Pseudomonas stutzeri KR7 for C24 alkane.

  10. Eliminate indeterminacies of independent component analysis for chemometrics

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An improved method has been proposed to eliminate the indeterminacies of independent component analysis (ICA) for chemomet- rics. Following the arrangement of principal components analysis (PCA), the ICA mixing matrix is selected as signal content indexes, and ICA output are sorted and directed. After many times reputations, independent components (Ics) are paired according to the maximum correlation coefficient, and then the mean values of each IC substitutes the original Ics. This indicates that the ICA inde- terminacies are eliminated. A simulation example is tested to validate this improvement. Finally, a set of experimental LC-MS data is processed without any prior knowledge or specific limitation and the results show that the improved ICA can directly separate the mixed signals in chemometrics, and it is simpler and more reasonable than the simple to use interactive self-modeling mixture analysis (SIMPLISMA).

  11. Chemometric assessment of enhanced bioremediation of oil contaminated soils

    DEFF Research Database (Denmark)

    Soleimani, Mohsen; Farhoudi, Majid; Christensen, Jan H.

    2013-01-01

    steranes were used for determining the level and type of hydrocarbon contamination. The same methods were used to study oil weathering of 2 to 6 ring polycyclic aromatic compounds (PACs). Results demonstrated that bacterial enrichment and addition of nutrients were most efficient with 50% to 62% removal......Bioremediation is a promising technique for reclamation of oil polluted soils. In this study, six methods for enhancing bioremediation were tested on oil contaminated soils from three refinery areas in Iran (Isfahan, Arak, and Tehran). The methods included bacterial enrichment, planting......, and addition of nitrogen and phosphorous, molasses, hydrogen peroxide, and a surfactant (Tween 80). Total petroleum hydrocarbon (TPH) concentrations and CHEMometric analysis of Selected Ion Chromatograms (SIC) termed CHEMSIC method of petroleum biomarkers including terpanes, regular, diaromatic and triaromatic...

  12. Aspects of recent developments in analytical chemometrics

    Institute of Scientific and Technical Information of China (English)

    LIANG; Yizeng; WU; Hailong; SHEN; Guoli; JIANG; Jianhui; LIANG; Sheng

    2006-01-01

    Some aspects of recent developments in analytical chemometrics are discussed, in particular the developments viewed from the angle of the research efforts undertaken in authors' laboratories. The topics concerned include resolution of high-order chemical data, morphological theory and methodology for chemical signal processing, multivariate calibration and chemical pattern recognition for solving complex chemical problems, and resolution of two-way chemical data from hyphenated chromatographic instruments.

  13. Curriculum Practices: Their Effect and/or Relationship to Selected Biographical and Professional Variables

    Science.gov (United States)

    Beecher, Clarence

    1978-01-01

    Attempts to determine significant differences between selected variables and attitudes toward participation in curriculum planning, curriculum use, adaptation of curriculum content, and curriculum role patterns and to verify significant relationships between the dependent variables and teaching experience, grade level, and education. Data were…

  14. Selection of Variables in Exploratory Factor Analysis: An Empirical Comparison of a Stepwise and Traditional Approach

    Science.gov (United States)

    Hogarty, Kristine Y.; Kromrey, Jeffrey D.; Ferron, John M.; Hines, Constance V.

    2004-01-01

    The purpose of this study was to investigate and compare the performance of a stepwise variable selection algorithm to traditional exploratory factor analysis. The Monte Carlo study included six factors in the design; the number of common factors; the number of variables explained by the common factors; the magnitude of factor loadings; the number…

  15. Noniterative Factor Analysis Estimators, with Algorithms for Subset and Instrumental Variable Selection.

    Science.gov (United States)

    Cudeck, Robert

    1991-01-01

    Two algorithms that automatically select subsets of variables (PACE algorithm) and reference variables (Fabin estimators), respectively, used for the noniterative estimators are presented. The PACE algorithm is based on a nonsymmetric matrix sweep operator. A Monte Carlo experiment compares the relative performance of these estimators and others.…

  16. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    Science.gov (United States)

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  17. Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection

    DEFF Research Database (Denmark)

    Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald

    2013-01-01

    The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PL...

  18. Symbiosis of chemometrics and metabolomics: Past, present, and future

    NARCIS (Netherlands)

    Greef, J. van der; Smilde, A.K.

    2005-01-01

    Metabolomics is a growing area in the field of systems biology. Metabolomics has already a long history and also the connection of metabolomics with chemometrics goes back some time. This review discusses the symbiosis of metabolomics and chemometrics with emphasis on the medical domain, puts the co

  19. The Time Domain Spectroscopic Survey: Variable Object Selection and Anticipated Results

    CERN Document Server

    Morganson, Eric; Anderson, Scott F; Ruan, John J; Myers, Adam D; Eracleous, Michael; Kelly, Brandon; Badenes, Carlos; Banados, Eduardo; Blanton, Michael R; Bershady, Matthew A; Borissova, Jura; Brandt, William Nielsen; Burgett, William S; Chambers, Kenneth; Draper, Peter W; Davenport, James R A; Flewelling, Heather; Garnavich, Peter; Hawley, Suzanne L; Hodapp, Klaus W; Isler, Jedidah C; Kaiser, Nick; Kinemuchi, Karen; Kudritzki, Rolf P; Metcalfe, Nigel; Morgan, Jeffrey S; Paris, Isabelle; Parvizi, Mahmoud; Poleski, Radoslaw; Price, Paul A; Salvato, Mara; Shanks, Tom; Schlafly, Eddie F; Schneider, Donald P; Shen, Yue; Stassun, Keivan; Tonry, John T; Walter, Fabian; Waters, Chris Z

    2015-01-01

    We present the selection algorithm and anticipated results for the Time Domain Spectroscopic Survey (TDSS). TDSS is an SDSS-IV eBOSS subproject that will provide initial identification spectra of approximately 220,000 luminosity-variable objects (variable stars and AGN) across 7,500 square degrees selected from a combination of SDSS and multi-epoch Pan-STARRS1 photometry. TDSS will be the largest spectroscopic survey to explicitly target variable objects, avoiding pre-selection on the basis of colors or detailed modeling of specific variability characteristics. Kernel Density Estimate (KDE) analysis of our target population performed on SDSS Stripe 82 data suggests our target sample will be 95% pure (meaning 95% of objects we select have genuine luminosity variability of a few magnitudes or more). Our final spectroscopic sample will contain roughly 135,000 quasars and 85,000 stellar variables, approximately 4,000 of which will be RR Lyrae stars which may be used as outer Milky Way probes. The variability-sele...

  20. Positive Selection Pressure Drives Variation on the Surface-Exposed Variable Proteins of the Pathogenic Neisseria.

    Science.gov (United States)

    Wachter, Jenny; Hill, Stuart

    2016-01-01

    Pathogenic species of Neisseria utilize variable outer membrane proteins to facilitate infection and proliferation within the human host. However, the mechanisms behind the evolution of these variable alleles remain largely unknown due to analysis of previously limited datasets. In this study, we have expanded upon the previous analyses to substantially increase the number of analyzed sequences by including multiple diverse strains, from various geographic locations, to determine whether positive selective pressure is exerted on the evolution of these variable genes. Although Neisseria are naturally competent, this analysis indicates that only intrastrain horizontal gene transfer among the pathogenic Neisseria principally account for these genes exhibiting linkage equilibrium which drives the polymorphisms evidenced within these alleles. As the majority of polymorphisms occur across species, the divergence of these variable genes is dependent upon the species and is independent of geographical location, disease severity, or serogroup. Tests of neutrality were able to detect strong selection pressures acting upon both the opa and pil gene families, and were able to locate the majority of these sites within the exposed variable regions of the encoded proteins. Evidence of positive selection acting upon the hypervariable domains of Opa contradicts previous beliefs and provides evidence for selection of receptor binding. As the pathogenic Neisseria reside exclusively within the human host, the strong selection pressures acting upon both the opa and pil gene families provide support for host immune system pressure driving sequence polymorphisms within these variable genes.

  1. Novel Harmonic Regularization Approach for Variable Selection in Cox’s Proportional Hazards Model

    Directory of Open Access Journals (Sweden)

    Ge-Jin Chu

    2014-01-01

    Full Text Available Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2select key risk factors in the Cox’s proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL, the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.

  2. Chemometrics applications in biotech processes: assessing process comparability.

    Science.gov (United States)

    Bhushan, Nitish; Hadpe, Sandip; Rathore, Anurag S

    2012-01-01

    A typical biotech process starts with the vial of the cell bank, ends with the final product and has anywhere from 15 to 30 unit operations in series. The total number of process variables (input and output parameters) and other variables (raw materials) can add up to several hundred variables. As the manufacturing process is widely accepted to have significant impact on the quality of the product, the regulatory agencies require an assessment of process comparability across different phases of manufacturing (Phase I vs. Phase II vs. Phase III vs. Commercial) as well as other key activities during product commercialization (process scale-up, technology transfer, and process improvement). However, assessing comparability for a process with such a large number of variables is nontrivial and often companies resort to qualitative comparisons. In this article, we present a quantitative approach for assessing process comparability via use of chemometrics. To our knowledge this is the first time that such an approach has been published for biotech processing. The approach has been applied to an industrial case study involving evaluation of two processes that are being used for commercial manufacturing of a major biosimilar product. It has been demonstrated that the proposed approach is able to successfully identify the unit operations in the two processes that are operating differently. We expect this approach, which can also be applied toward assessing product comparability, to be of great use to both the regulators and the industry which otherwise struggle to assess comparability.

  3. A survey of variable selection methods in two Chinese epidemiology journals

    Directory of Open Access Journals (Sweden)

    Lynn Henry S

    2010-09-01

    Full Text Available Abstract Background Although much has been written on developing better procedures for variable selection, there is little research on how it is practiced in actual studies. This review surveys the variable selection methods reported in two high-ranking Chinese epidemiology journals. Methods Articles published in 2004, 2006, and 2008 in the Chinese Journal of Epidemiology and the Chinese Journal of Preventive Medicine were reviewed. Five categories of methods were identified whereby variables were selected using: A - bivariate analyses; B - multivariable analysis; e.g. stepwise or individual significance testing of model coefficients; C - first bivariate analyses, followed by multivariable analysis; D - bivariate analyses or multivariable analysis; and E - other criteria like prior knowledge or personal judgment. Results Among the 287 articles that reported using variable selection methods, 6%, 26%, 30%, 21%, and 17% were in categories A through E, respectively. One hundred sixty-three studies selected variables using bivariate analyses, 80% (130/163 via multiple significance testing at the 5% alpha-level. Of the 219 multivariable analyses, 97 (44% used stepwise procedures, 89 (41% tested individual regression coefficients, but 33 (15% did not mention how variables were selected. Sixty percent (58/97 of the stepwise routines also did not specify the algorithm and/or significance levels. Conclusions The variable selection methods reported in the two journals were limited in variety, and details were often missing. Many studies still relied on problematic techniques like stepwise procedures and/or multiple testing of bivariate associations at the 0.05 alpha-level. These deficiencies should be rectified to safeguard the scientific validity of articles published in Chinese epidemiology journals.

  4. Characterization and classification of pseudo-stationary phases in micellar electrokinetic chromatography using chemometric methods.

    Science.gov (United States)

    Fu, Cexiong; Khaledi, Morteza G

    2014-03-04

    Two types of chemometric methods, principal component analysis (PCA) and cluster analysis, are employed to characterize and classify a total of 70 pseudostationary phases (54 distinct systems and 16 decoy systems) in micellar electrokinetic chromatography (MEKC). PCA excels at removing redundant information for micellar phase characterization and retaining principal determinants for phase classification. While PCA is useful in the characterization of micelle selectivities, it is ineffective in defining the grouping of micellar phases. Hierarchical clustering yields a complete dendrogram of cluster structures but provides only limited cluster characterizations. The combination of these two chemometric methods leads to a comprehensive interpretation of the micellar phase classification. Moreover, the k-means analysis can further discern subtle differences among those closely located micellar phases. All three chemometric methods result in similar classifications with respect to the similarities and differences of the 70 micelle systems investigated. These systems are categorized into 3 major clusters: fluoro-surfactants represent cluster I, identified as strong hydrogen bond donors and dipolar but weak hydrogen bond acceptors. Cluster II includes sulfonated acrylamide/acrylate copolymers and surfactants with trimethylammonium head groups, characterized by strong hydrophobicity (v) and weak hydrogen bond acidity (b). The last cluster consists of two subclusters: clusters III and IV. Cluster III includes siloxane-based polymeric micelles, exhibiting weak hydrophobicity and medium hydrogen bond acidity and basicity (a), and the cluster IV micellar systems are characterized by their strong hydrophobicity and medium hydrogen bond acidity and basicity but rather weak dipolarity. Cluster III differs from cluster IV by its slightly weaker hydrophobicity and hydrogen bond donating capability. The classification by chemometric methods is in good agreement with the

  5. QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database

    CERN Document Server

    Kim, Dae-Won; Byun, Yong-Ik; Alcock, Charles; Khardon, Roni

    2011-01-01

    We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false po...

  6. Determination of hydroxy acids in cosmetics by chemometric experimental design and cyclodextrin-modified capillary electrophoresis.

    Science.gov (United States)

    Liu, Pei-Yu; Lin, Yi-Hui; Feng, Chia Hsien; Chen, Yen-Ling

    2012-10-01

    A CD-modified CE method was established for quantitative determination of seven hydroxy acids in cosmetic products. This method involved chemometric experimental design aspects, including fractional factorial design and central composite design. Chemometric experimental design was used to enhance the method's separation capability and to explore the interactions between parameters. Compared to the traditional investigation that uses multiple parameters, the method that used chemometric experimental design was less time-consuming and lower in cost. In this study, the influences of three experimental variables (phosphate concentration, surfactant concentration, and methanol percentage) on the experimental response were investigated by applying a chromatographic resolution statistic function. The optimized conditions were as follows: a running buffer of 150 mM phosphate solution (pH 7) containing 0.5 mM CTAB, 3 mM γ-CD, and 25% methanol; 20 s sample injection at 0.5 psi; a separation voltage of -15 kV; temperature was set at 25°C; and UV detection at 200 nm. The seven hydroxy acids were well separated in less than 10 min. The LOD (S/N = 3) was 625 nM for both salicylic acid and mandelic acid. The correlation coefficient of the regression curve was greater than 0.998. The RSD and relative error values were all less than 9.21%. After optimization and validation, this simple and rapid analysis method was considered to be established and was successfully applied to several commercial cosmetic products.

  7. Unique ion filter: a data reduction tool for GC/MS data preprocessing prior to chemometric analysis.

    Science.gov (United States)

    Adutwum, L A; Harynuk, J J

    2014-08-01

    Using raw GC/MS data as the X-block for chemometric modeling has the potential to provide better classification models for complex samples when compared to using the total ion current (TIC), extracted ion chromatograms/profiles (EIC/EIP), or integrated peak tables. However, the abundance of raw GC/MS data necessitates some form of data reduction/feature selection to remove the variables containing primarily noise from the data set. Several algorithms for feature selection exist; however, due to the extreme number of variables (10(6)-10(8) variables per chromatogram), the feature selection time can be prolonged and computationally expensive. Herein, we present a new prefilter for automated data reduction of GC/MS data prior to feature selection. This tool, termed unique ion filter (UIF), is a module that can be added after chromatographic alignment and prior to any subsequent feature selection algorithm. The UIF objectively reduces the number of irrelevant or redundant variables in raw GC/MS data, while preserving potentially relevant analytical information. In the m/z dimension, data are reduced from a full spectrum to a handful of unique ions for each chromatographic peak. In the time dimension, data are reduced to only a handful of scans around each peak apex. UIF was applied to a data set of GC/MS data for a variety of gasoline samples to be classified using partial least-squares discriminant analysis (PLS-DA) according to octane rating. It was also applied to a series of chromatograms from casework fire debris analysis to be classified on the basis of whether or not signatures of gasoline were detected. By reducing the overall population of candidate variables subjected to subsequent variable selection, the UIF reduced the total feature selection time for which a perfect classification of all validation data was achieved from 373 to 9 min (98% reduction in computing time). Additionally, the significant reduction in included variables resulted in a concomitant

  8. Simultaneous estimation and variable selection in median regression using Lasso-type penalty.

    Science.gov (United States)

    Xu, Jinfeng; Ying, Zhiliang

    2010-06-01

    We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example.

  9. Studies of an x ray selected sample of cataclysmic variables. Ph.D. Thesis

    Science.gov (United States)

    Silber, Andrew D.

    1986-01-01

    Just prior to the thesis research, an all-sky survey in hard x rays with the HEAO-1 satellite and further observations in the optical resulted in a catalog of about 700 x-ray sources with known optical counterparts. This sample includes 43 cataclysmic variables, which are binaries consisting of a detached white-dwarf and a Roche lobe filling companion star. This thesis consists of studies of the x-ray selected sample of catalcysmic variables.

  10. Low rank updated LS-SVM classifiers for fast variable selection.

    Science.gov (United States)

    Ojeda, Fabian; Suykens, Johan A K; De Moor, Bart

    2008-01-01

    Least squares support vector machine (LS-SVM) classifiers are a class of kernel methods whose solution follows from a set of linear equations. In this work we present low rank modifications to the LS-SVM classifiers that are useful for fast and efficient variable selection. The inclusion or removal of a candidate variable can be represented as a low rank modification to the kernel matrix (linear kernel) of the LS-SVM classifier. In this way, the LS-SVM solution can be updated rather than being recomputed, which improves the efficiency of the overall variable selection process. Relevant variables are selected according to a closed form of the leave-one-out (LOO) error estimator, which is obtained as a by-product of the low rank modifications. The proposed approach is applied to several benchmark data sets as well as two microarray data sets. When compared to other related algorithms used for variable selection, simulations applying our approach clearly show a lower computational complexity together with good stability on the generalization error.

  11. Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

    KAUST Repository

    Chen, Lisha

    2012-12-01

    The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.

  12. Combining epidemiologic and biostatistical tools to enhance variable selection in HIV cohort analyses.

    Directory of Open Access Journals (Sweden)

    Christopher Rentsch

    Full Text Available BACKGROUND: Variable selection is an important step in building a multivariate regression model for which several methods and statistical packages are available. A comprehensive approach for variable selection in complex multivariate regression analyses within HIV cohorts is explored by utilizing both epidemiological and biostatistical procedures. METHODS: Three different methods for variable selection were illustrated in a study comparing survival time between subjects in the Department of Defense's National History Study and the Atlanta Veterans Affairs Medical Center's HIV Atlanta VA Cohort Study. The first two methods were stepwise selection procedures, based either on significance tests (Score test, or on information theory (Akaike Information Criterion, while the third method employed a Bayesian argument (Bayesian Model Averaging. RESULTS: All three methods resulted in a similar parsimonious survival model. Three of the covariates previously used in the multivariate model were not included in the final model suggested by the three approaches. When comparing the parsimonious model to the previously published model, there was evidence of less variance in the main survival estimates. CONCLUSIONS: The variable selection approaches considered in this study allowed building a model based on significance tests, on an information criterion, and on averaging models using their posterior probabilities. A parsimonious model that balanced these three approaches was found to provide a better fit than the previously reported model.

  13. The use of vector bootstrapping to improve variable selection precision in Lasso models.

    Science.gov (United States)

    Laurin, Charles; Boomsma, Dorret; Lubke, Gitta

    2016-08-01

    The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.

  14. Discrimination of Corsican honey by FT-Raman spectroscopy and chemometrics

    Directory of Open Access Journals (Sweden)

    Fernández Pierna, JA.

    2011-01-01

    Full Text Available Honey is a complex and challenging product to analyze due mainly to its composition consisting on various botanical sources. The discrimination of the origin of honey is of prime importance in order to reinforce the consumer trust in this typical food product. But this is not an easy task as usually no single chemical or physical parameter is sufficient. The aim of our paper is to investigate whether FT-Raman spectroscopy as spectroscopic fingerprint technique combined with some chemometric tools can be used as a rapid and reliable method for the discrimination of honey according to their source. In addition to that, different chemometric models are constructed in order to discriminate between Corsican honeys and honey coming from other regions in France, Italy, Austria, Germany and Ireland based on their FT-Raman spectra. These regions show a large variation in their plants. The developed models include the use of exploratory techniques as the Fisher criterion for wavenumber selection and supervised methods as Partial Least Squares-Discriminant Analysis (PLS-DA or Support Vector Machines (SVM. All these models showed a correct classification ratio between 85% and 90% of average showing that Raman spectroscopy combined to chemometric treatments is a promising way for rapid and non-expensive discrimination of honey according to their origin.

  15. Analysis of Flavonoid in Medicinal Plant Extract Using Infrared Spectroscopy and Chemometrics

    Directory of Open Access Journals (Sweden)

    Lestyo Wulandari

    2016-01-01

    Full Text Available Infrared (IR spectroscopy combined with chemometrics has been developed for simple analysis of flavonoid in the medicinal plant extract. Flavonoid was extracted from medicinal plant leaves by ultrasonication and maceration. IR spectra of selected medicinal plant extract were correlated with flavonoid content using chemometrics. The chemometric method used for calibration analysis was Partial Last Square (PLS and the methods used for classification analysis were Linear Discriminant Analysis (LDA, Soft Independent Modelling of Class Analogies (SIMCA, and Support Vector Machines (SVM. In this study, the calibration of NIR model that showed best calibration with R2 and RMSEC value was 0.9916499 and 2.1521897, respectively, while the accuracy of all classification models (LDA, SIMCA, and SVM was 100%. R2 and RMSEC of calibration of FTIR model were 0.8653689 and 8.8958149, respectively, while the accuracy of LDA, SIMCA, and SVM was 86.0%, 91.2%, and 77.3%, respectively. PLS and LDA of NIR models were further used to predict unknown flavonoid content in commercial samples. Using these models, the significance of flavonoid content that has been measured by NIR and UV-Vis spectrophotometry was evaluated with paired samples t-test. The flavonoid content that has been measured with both methods gave no significant difference.

  16. Current Debates on Variability in Child Welfare Decision-Making: A Selected Literature Review

    Directory of Open Access Journals (Sweden)

    Emily Keddell

    2014-11-01

    Full Text Available This article considers selected drivers of decision variability in child welfare decision-making and explores current debates in relation to these drivers. Covering the related influences of national orientation, risk and responsibility, inequality and poverty, evidence-based practice, constructions of abuse and its causes, domestic violence and cognitive processes, it discusses the literature in regards to how each of these influences decision variability. It situates these debates in relation to the ethical issue of variability and the equity issues that variability raises. I propose that despite the ecological complexity that drives decision variability, that improving internal (within-country decision consistency is still a valid goal. It may be that the use of annotated case examples, kind learning systems, and continued commitments to the social justice issues of inequality and individualisation can contribute to this goal.

  17. Variable selectivity of the Hitachi chemistry analyzer chloride ion-selective electrode toward interfering ions.

    Science.gov (United States)

    Wang, T; Diamandis, E P; Lane, A; Baines, A D

    1994-02-01

    Chloride measurements by ion-selective electrodes are vulnerable to interference by anions such as iodide, thiocyanate, nitrate, and bromide. We have found that the degree of interference of these anions on the Hitachi chemistry analyzer chloride electrode varies from electrode to electrode and this variation can even occur within the same lot of membrane. This variation is not dependent upon the length of time the cartridge has been in the analyzer because no correlation existed between the usage time and the electrode response to interfering ions. Neither is this variation due to the deterioration of the electrode because all electrodes tested had calibration slopes within the manufacturer's specification. Our study, however, showed that even after repeated exposure to a plasma sample containing 2 mM thiocyanate, the chloride electrode was still able to accurately measure the chloride in plasma without thiocyanate, thus confirming that a carryover effect does not exist from a previous thiocyanate-containing sample.

  18. Data for comparison of climate envelope models developed using expert-selected variables versus statistical selection

    Science.gov (United States)

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romanach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    The data we used for this study include species occurrence data (n=15 species), climate data and predictions, an expert opinion questionnaire, and species masks that represented the model domain for each species. For this data release, we include the results of the expert opinion questionnaire and the species model domains (or masks). We developed an expert opinion questionnaire to gather information on expert opinion regarding the importance of climate variables in determining a species geographic range. The species masks, or model domains, were defined separately for each species using a variation of the “target-group” approach (Phillips et al. 2009), where the domain was determined using convex polygons including occurrence data for at least three phylogenetically related and similar species (Watling et al. 2012). The species occurrence data, climate data, and climate predictions are freely available online, and therefore not included in this data release. The species occurrence data were obtained from the online database Global Biodiversity Information Facility (GBIF; http://www.gbif.org/), and from scientific literature (Watling et al. 2011). Climate data were obtained from the WorldClim database (Hijmans et al. 2005) and climate predictions were obtained from the Center for Ocean-Atmosphere Prediction Studies (COAPS) at Florida State University (https://floridaclimateinstitute.org/resources/data-sets/regional-downscaling). See metadata for references.

  19. A comprehensive strategy using chromatographic profiles combined with chemometric methods: Application to quality control of Polygonum cuspidatum Sieb. et Zucc.

    Science.gov (United States)

    Gao, Fangyuan; Xu, Zihua; Wang, Weizhong; Lu, Guocai; Vander Heyden, Yvan; Zhou, Tingting; Fan, Guorong

    2016-09-30

    For the strict quality control of herbs, a comprehensive strategy based on chromatographic profiles and chemometric methods which could reliably select quantitative indices, robustly quantitate multi-markers and systematically compare different chemometric methods was proposed and successfully applied to the quality analysis of P. cuspidatum. Based on the construction of chromatographic profiles by an efficient accelerated solvent extraction (ASE) and reliable high-performance liquid chromatography-ultraviolet (HPLC-UV) methods, different chemometric methods were employed, namely similarity analyses (SA), hierarchical clustering analysis (HCA) and linear discriminant analysis (LDA). The differences in classification of herb samples were studied for the first time. To reasonably determine the quality of herbs and evaluate different chemometric methods, a comprehensive strategy containing three key steps was performed including selection of quantitative indices, development of a reliable quantification method and adoption of an easily calculated and visible parameter. The quantitative method which was acceptable with good linearity with correlation coefficients >0.9995 and satisfactory repeatability (RSD<1.5%), precision (RSD<2.84%), reproducibility (RSD<2.88%), stability (RSD<2.85%) and recoveries (91.5%-105.6%, RSD<2.83%) was applied to quality evaluation of fourteen batches of the P. cuspidatum samples through simultaneous quantitative determination of fifteen marker compounds. The limits of quantitation of fifteen compounds ranged from 1 to 60μg/ml. From the results of the quality evaluation, it was found that the different calculation theories of the chemometric methods resulted in the variation of classifiers of samples: SA classified samples through the mean values and HCA & LDA classified similar objects.

  20. Spatial variable selection methods for investigating acute health effects of fine particulate matter components.

    Science.gov (United States)

    Boehm Vock, Laura F; Reich, Brian J; Fuentes, Montserrat; Dominici, Francesca

    2015-03-01

    Multi-site time series studies have reported evidence of an association between short term exposure to particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in the effect may partially be due to differing community level exposure and health characteristics, but also due to the chemical composition of PM which is known to vary greatly by location and time. The objective of this article is to identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated components, we must incorporate some regularization into a statistical model. We assume that, at each spatial location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection, but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model differs from current spatial variable selection techniques by accommodating both local and global variable selection. The model is used to study the association between fine PM (PM <2.5μm) components, measured at 115 counties nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients.

  1. A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Aiqian Zhang

    2012-05-01

    Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.

  2. a simple k-map based variable selection scheme in the direct ...

    African Journals Online (AJOL)

    Dr Obe

    The procedure is based on the fact that if 2. X minterms (where x = 1, ... approach [3, 4, 5]. What has hindered ... selection scheme for solving the problem of partitioning the ... proposed k-map based variable partition criterion can be sumarised ...

  3. The Effect of Listening to Specific Musical Genre Selections on Measures of Heart Rate Variability

    Science.gov (United States)

    Orman, Evelyn K.

    2011-01-01

    University students (N = 30) individually listened to the Billboard 100 top-ranked musical selection for their most and least liked musical genre. Two minutes of silence preceded each musical listening condition, and heart rate variability (HRV) was recorded throughout. All HRV measures decreased during music listening as compared with silence.…

  4. Meta-Statistics for Variable Selection: The R Package BioMark

    Directory of Open Access Journals (Sweden)

    Ron Wehrens

    2012-11-01

    Full Text Available Biomarker identification is an ever more important topic in the life sciences. With the advent of measurement methodologies based on microarrays and mass spectrometry, thousands of variables are routinely being measured on complex biological samples. Often, the question is what makes two groups of samples different. Classical hypothesis testing suffers from the multiple testing problem; however, correcting for this often leads to a lack of power. In addition, choosing α cutoff levels remains somewhat arbitrary. Also in a regression context, a model depending on few but relevant variables will be more accurate and precise, and easier to interpret biologically.We propose an R package, BioMark, implementing two meta-statistics for variable selection. The first, higher criticism, presents a data-dependent selection threshold for significance, instead of a cookbook value of α = 0.05. It is applicable in all cases where two groups are compared. The second, stability selection, is more general, and can also be applied in a regression context. This approach uses repeated subsampling of the data in order to assess the variability of the model coefficients and selects those that remain consistently important. It is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data. BioMark also contains functionality for simulating data with specific characteristics for algorithm development and testing.

  5. The Effect of Listening to Specific Musical Genre Selections on Measures of Heart Rate Variability

    Science.gov (United States)

    Orman, Evelyn K.

    2011-01-01

    University students (N = 30) individually listened to the Billboard 100 top-ranked musical selection for their most and least liked musical genre. Two minutes of silence preceded each musical listening condition, and heart rate variability (HRV) was recorded throughout. All HRV measures decreased during music listening as compared with silence.…

  6. Variable selection in PLSR and extensions to a multi-block setting for metabolomics data

    DEFF Research Database (Denmark)

    Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach

    When applying LC-MS or NMR spectroscopy in metabolomics studies, high-dimensional data are generated and effective tools for variable selection are needed in order to detect the important metabolites. Methods based on sparsity combined with PLSR have recently attracted attention in the field...

  7. Chemometrics: From classical to genetic algorithms

    Directory of Open Access Journals (Sweden)

    Leardi, Riccardo

    2002-03-01

    Full Text Available In this paper the fundamentals of Chemometrics are presented, by means of a quick overview of the most relevant techniques for data display, classification, modeling and calibration. Two emerging techniques such as Genetic Algorithms and Artificial Neural Networks will also be presented. Goal of the paper is to make people aware of the great superiority of multivariate analysis over the commonly used univariate approach. Mathematical and algorithmical details are not presented, since the paper is mainly focused on the general problems to which Chemometrics can be successfully applied in the field of Food Chemistry.En este artículo se muestran los aspectos fundamentales de la Quimiometria por medio de una revisión rápida de las técnicas más relevantes para mostrar los datos, modelar y calibrar. Se describen dos técnicas emergentes como los algoritmos genéticos y las redes neuronales. El objetivo del articulo es que la comunidad científica tome conciencia de la gran superioridad del análisis multivariante sobre el análisis univariante. No se describen los detalles matemáticos y algorítmicos porque el articulo está dirigido a problemas genéricos en los que la Quimiometría puede ser aplicada con éxito dentro del campo de la Química Analítica.

  8. Variable selection in the explorative analysis of several data blocks in metabolomics

    DEFF Research Database (Denmark)

    Karaman, İbrahim; Nørskov, Natalja; Yde, Christian Clement

    Methods which can integrate more than two data matrices have been developed in and applied to the fields of psychometrics, consumer science, econometrics and process control. Recently these methods have been applied to multiple data sets in biosciences and proven to be very powerful in situations...... highly correlated data sets in one integrated approach. Due to the high number of variables in data sets from metabolomics (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need...... to be related. Tools for the handling of mental overflow minimising false discovery rates both by using statistical and biological validation in an integrative approach are needed. In this paper different strategies for variable selection were considered with respect to false discovery and the possibility...

  9. Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology.

    Science.gov (United States)

    Hill, Steven M; Neve, Richard M; Bayani, Nora; Kuo, Wen-Lin; Ziyad, Safiyyah; Spellman, Paul T; Gray, Joe W; Mukherjee, Sach

    2012-05-11

    An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.

  10. Knowledge-based variable selection for learning rules from proteomic data

    Directory of Open Access Journals (Sweden)

    Hogan William R

    2009-09-01

    Full Text Available Abstract Background The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In particular, we use the Empirical Proteomics Ontology Knowledge Base (EPO-KB that contains previously identified and validated proteomic biomarkers to select m/zs in a proteomic dataset prior to analysis to increase performance. Results We show that using EPO-KB as a pre-processing method, specifically selecting all biomarkers found only in the biofluid of the proteomic dataset, reduces the dimensionality by 95% and provides a statistically significantly greater increase in performance over no variable selection and random variable selection. Conclusion Knowledge-based variable selection even with a sparsely-populated resource such as the EPO-KB increases overall performance of rule-learning for disease classification from high-dimensional proteomic mass spectra.

  11. Penalized variable selection procedure for Cox models with semiparametric relative risk

    CERN Document Server

    Du, Pang; Liang, Hua; 10.1214/09-AOS780

    2010-01-01

    We study the Cox models with semiparametric relative risk, which can be partially linear with one nonparametric component, or multiple additive or nonadditive nonparametric components. A penalized partial likelihood procedure is proposed to simultaneously estimate the parameters and select variables for both the parametric and the nonparametric parts. Two penalties are applied sequentially. The first penalty, governing the smoothness of the multivariate nonlinear covariate effect function, provides a smoothing spline ANOVA framework that is exploited to derive an empirical model selection tool for the nonparametric part. The second penalty, either the smoothly-clipped-absolute-deviation (SCAD) penalty or the adaptive LASSO penalty, achieves variable selection in the parametric part. We show that the resulting estimator of the parametric part possesses the oracle property, and that the estimator of the nonparametric part achieves the optimal rate of convergence. The proposed procedures are shown to work well i...

  12. Chemometrics tools used in analytical chemistry: an overview.

    Science.gov (United States)

    Kumar, Naveen; Bansal, Ankit; Sarma, G S; Rawal, Ravindra K

    2014-06-01

    This article presents various important tools of chemometrics utilized as data evaluation tools generated by various hyphenated analytical techniques including their application since its advent to today. The work has been divided into various sections, which include various multivariate regression methods and multivariate resolution methods. Finally the last section deals with the applicability of chemometric tools in analytical chemistry. The main objective of this article is to review the chemometric methods used in analytical chemistry (qualitative/quantitative), to determine the elution sequence, classify various data sets, assess peak purity and estimate the number of chemical components. These reviewed methods further can be used for treating n-way data obtained by hyphenation of LC with multi-channel detectors. We prefer to provide a detailed view of various important methods developed with their algorithm in favor of employing and understanding them by researchers not very familiar with chemometrics.

  13. Bootstrap rank-ordered conditional mutual information (broCMI): A nonlinear input variable selection method for water resources modeling

    Science.gov (United States)

    Quilty, John; Adamowski, Jan; Khalil, Bahaa; Rathinasamy, Maheswaran

    2016-03-01

    The input variable selection problem has recently garnered much interest in the time series modeling community, especially within water resources applications, demonstrating that information theoretic (nonlinear)-based input variable selection algorithms such as partial mutual information (PMI) selection (PMIS) provide an improved representation of the modeled process when compared to linear alternatives such as partial correlation input selection (PCIS). PMIS is a popular algorithm for water resources modeling problems considering nonlinear input variable selection; however, this method requires the specification of two nonlinear regression models, each with parametric settings that greatly influence the selected input variables. Other attempts to develop input variable selection methods using conditional mutual information (CMI) (an analog to PMI) have been formulated under different parametric pretenses such as k nearest-neighbor (KNN) statistics or kernel density estimates (KDE). In this paper, we introduce a new input variable selection method based on CMI that uses a nonparametric multivariate continuous probability estimator based on Edgeworth approximations (EA). We improve the EA method by considering the uncertainty in the input variable selection procedure by introducing a bootstrap resampling procedure that uses rank statistics to order the selected input sets; we name our proposed method bootstrap rank-ordered CMI (broCMI). We demonstrate the superior performance of broCMI when compared to CMI-based alternatives (EA, KDE, and KNN), PMIS, and PCIS input variable selection algorithms on a set of seven synthetic test problems and a real-world urban water demand (UWD) forecasting experiment in Ottawa, Canada.

  14. Comparison of different measurement techniques and variable selection methods for FT-MIR in wine analysis.

    Science.gov (United States)

    Friedel, Matthias; Patz, Claus-Dieter; Dietrich, Helmut

    2013-12-15

    For more than a decade, Fourier-transform infrared (FTIR) spectroscopy combined with partial least squares (PLS) regression has been used as a fast and reliable method for simultaneous estimation of multiple parameters in wine. In this study, different FTIR instruments (single bounce attenuated total reflection, transmission with variable and defined pathlength) and different variable selection techniques (full spectrum PLS, genetic algorithm PLS, interval PLS, principal variable PLS) were compared on an identical sample set of international wines and ten wine parameters. Results suggest that the single bounce attenuated total reflection technique is well suited for the analysis of ethanol, relative density and sugars, but less accurate in the analysis of organic acid content. The transmission instrument with variable pathlength shows good validation results for the analysis of organic acids, but less accurate results for the analysis of ethanol and relative density as compared to the other instruments. The transmission instrument with defined pathlength was well suited for the analysis for all parameters investigated in this study. Variable selection improved model robustness and calibration results, with genetic algorithm PLS being the most effective technique.

  15. Chemometrics: A new scenario in herbal drug standardization

    Directory of Open Access Journals (Sweden)

    Ankit Bansal

    2014-08-01

    Full Text Available Chromatography and spectroscopy techniques are the most commonly used methods in standardization of herbal medicines but the herbal system is not easy to analyze because of their complexity of chemical composition. Many cutting-edge analytical technologies have been introduced to evaluate the quality of medicinal plants and significant amount of measurement data has been produced. Chemometric techniques provide a good opportunity for mining more useful chemical information from the original data. Then, the application of chemometrics in the field of medicinal plants is spontaneous and necessary. Comprehensive methods and hyphenated techniques associated with chemometrics used for extracting useful information and supplying various methods of data processing are now more and more widely used in medicinal plants, among which chemometrics resolution methods and principal component analysis (PCA are most commonly used techniques. This review focuses on the recent various important analytical techniques, important chemometrics tools and interpretation of results by PCA, and applications of chemometrics in quality evaluation of medicinal plants in the authenticity, efficacy and consistency.

  16. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    Science.gov (United States)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-03-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.

  17. Spectroscopic and chemometric exploration of food quality

    DEFF Research Database (Denmark)

    Pedersen, Dorthe Kjær

    2002-01-01

    The desire to develop non-invasive rapid measurements of essential quality parameters in foods is the motivation of this thesis. Due to the speed and noninvasive properties of spectroscopic techniques, they have potential as on-line or atline methods and can be employed in the food industry...... in order to control the quality of the end product and to continuously monitor the production. In this thesis, the possibilities and limitations of the application of spectroscopy and chemometrics in rapid control of food quality are discussed and demonstrated by the examples in the eight included...... publications. Different aspects of food quality are covered, but the focus is mainly on the development of multivariate calibrations for predictions of rather complex attributes such as the water-holding capacity of meat, ethical quality of the slaughtering procedure, protein content of single wheat kernels...

  18. Joint Bayesian variable and graph selection for regression models with network-structured predictors.

    Science.gov (United States)

    Peterson, Christine B; Stingo, Francesco C; Vannucci, Marina

    2016-03-30

    In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.

  19. Gametocytes infectiousness to mosquitoes: variable selection using random forests, and zero inflated models

    CERN Document Server

    Genuer, Robin; Toussile, Wilson

    2011-01-01

    Malaria control strategies aiming at reducing disease transmission intensity may impact both oocyst intensity and infection prevalence in the mosquito vector. Thus far, mathematical models failed to identify a clear relationship between Plasmodium falciparum gametocytes and their infectiousness to mosquitoes. Natural isolates of gametocytes are genetically diverse and biologically complex. Infectiousness to mosquitoes relies on multiple parameters such as density, sex-ratio, maturity, parasite genotypes and host immune factors. In this article, we investigated how density and genetic diversity of gametocytes impact on the success of transmission in the mosquito vector. We analyzed data for which the number of covariates plus attendant interactions is at least of order of the sample size, precluding usage of classical models such as general linear models. We then considered the variable importance from random forests to address the problem of selecting the most influent variables. The selected covariates were ...

  20. The Effects of Basic Gymnastics Training Integrated with Physical Education Courses on Selected Motor Performance Variables

    Science.gov (United States)

    Alpkaya, Ufuk

    2013-01-01

    The purpose of this study is to determine the influence of gymnastics training integrated with physical education courses on selected motor performance variables in seven year old girls. Subjects were divided into two groups: (1) control group (N=15, X=7.56 plus or minus 0.46 year old); (2) gymnastics group (N=16, X=7.60 plus or minus 0.50 year…

  1. Hybrid model based on Genetic Algorithms and SVM applied to variable selection within fruit juice classification.

    Science.gov (United States)

    Fernandez-Lozano, C; Canto, C; Gestal, M; Andrade-Garda, J M; Rabuñal, J R; Dorado, J; Pazos, A

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.

  2. The Effects of Basic Gymnastics Training Integrated with Physical Education Courses on Selected Motor Performance Variables

    Science.gov (United States)

    Alpkaya, Ufuk

    2013-01-01

    The purpose of this study is to determine the influence of gymnastics training integrated with physical education courses on selected motor performance variables in seven year old girls. Subjects were divided into two groups: (1) control group (N=15, X=7.56 plus or minus 0.46 year old); (2) gymnastics group (N=16, X=7.60 plus or minus 0.50 year…

  3. Selected Macroeconomic Variables and Stock Market Movements: Empirical evidence from Thailand

    Directory of Open Access Journals (Sweden)

    Joseph Ato Forson

    2014-06-01

    Full Text Available This paper investigates and analyzes the long-run equilibrium relationship between the Thai stock Exchange Index (SETI and selected macroeconomic variables using monthly time series data that cover a 20-year period from January 1990 to December 2009. The following macroeconomic variables are included in our analysis: money supply (MS, the consumer price index (CPI, interest rate (IR and the industrial production index (IP (as a proxy for GDP. Our findings prove that the SET Index and the selected macroeconomic variables are cointegrated at I (1 and have a significant equilibrium relationship over the long run. Money supply demonstrates a strong positive relationship with the SET Index over the long run, whereas the industrial production index and consumer price index show negative long-run relationships with the SET Index. Furthermore, in non-equilibrium situations, the error correction mechanism suggests that the consumer price index, industrial production index and money supply each contribute in some way to restore equilibrium. In addition, using Toda and Yamamoto’s augmented Granger causality test, we identify a bi-causal relationship between industrial production and money supply and unilateral causal relationships between CPI and IR, IP and CPI, MS and CPI, and IP and SETI, indicating that all of these variables are sensitive to Thai stock market movements. The policy implications of these findings are also discussed.

  4. Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

    Directory of Open Access Journals (Sweden)

    Wells Martin T

    2008-05-01

    Full Text Available Abstract Background Identifying quantitative trait loci (QTL for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC and Bayesian information criterion (BIC. Results We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. Conclusion The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.

  5. Integration of Multiple Genomic Data Sources in a Bayesian Cox Model for Variable Selection and Prediction.

    Science.gov (United States)

    Treppmann, Tabea; Ickstadt, Katja; Zucknick, Manuela

    2017-01-01

    Bayesian variable selection becomes more and more important in statistical analyses, in particular when performing variable selection in high dimensions. For survival time models and in the presence of genomic data, the state of the art is still quite unexploited. One of the more recent approaches suggests a Bayesian semiparametric proportional hazards model for right censored time-to-event data. We extend this model to directly include variable selection, based on a stochastic search procedure within a Markov chain Monte Carlo sampler for inference. This equips us with an intuitive and flexible approach and provides a way for integrating additional data sources and further extensions. We make use of the possibility of implementing parallel tempering to help improve the mixing of the Markov chains. In our examples, we use this Bayesian approach to integrate copy number variation data into a gene-expression-based survival prediction model. This is achieved by formulating an informed prior based on copy number variation. We perform a simulation study to investigate the model's behavior and prediction performance in different situations before applying it to a dataset of glioblastoma patients and evaluating the biological relevance of the findings.

  6. Improving breast cancer classification with mammography, supported on an appropriate variable selection analysis

    Science.gov (United States)

    Pérez, Noel; Guevara, Miguel A.; Silva, Augusto

    2013-02-01

    This work addresses the issue of variable selection within the context of breast cancer classification with mammography. A comprehensive repository of feature vectors was used including a hybrid subset gathering image-based and clinical features. It aimed to gather experimental evidence of variable selection in terms of cardinality, type and find a classification scheme that provides the best performance over the Area Under Receiver Operating Characteristics Curve (AUC) scores using the ranked features subset. We evaluated and classified a total of 300 subsets of features formed by the application of Chi-Square Discretization, Information-Gain, One-Rule and RELIEF methods in association with Feed-Forward Backpropagation Neural Network (FFBP), Support Vector Machine (SVM) and Decision Tree J48 (DTJ48) Machine Learning Algorithms (MLA) for a comparative performance evaluation based on AUC scores. A variable selection analysis was performed for Single-View Ranking and Multi-View Ranking groups of features. Features subsets representing Microcalcifications (MCs), Masses and both MCs and Masses lesions achieved AUC scores of 0.91, 0.954 and 0.934 respectively. Experimental evidence demonstrated that classification performance was improved by combining image-based and clinical features. The most important clinical and image-based features were StromaDistortion and Circularity respectively. Other less important but worth to use due to its consistency were Contrast, Perimeter, Microcalcification, Correlation and Elongation.

  7. Green method based on a flow-batch analyzer system for the simultaneous determination of ciprofloxacin and dexamethasone in pharmaceuticals using a chemometric approach.

    Science.gov (United States)

    Razuc, Mariela F; Grünhut, Marcos; Saidman, Elbio; Garrido, Mariano; Fernández Band, Beatriz

    2013-10-15

    A green FBA method with UV detection was developed for simultaneous determination of ciprofloxacin (CIP) and dexamethasone (DEX) in ophthalmic and otic preparations. A lab-made mixing detection chamber (MDC) was designed and coupled to the spectrophotometer in order to perform the mixing of solutions and the detection in the same receptacle. Only water was used as solvent and no previous separation of the components was required. Both analytes have a strong absorption between 190 and 370 nm in aqueous medium, at pH 7. However, the spectrum of DEX is embedded in the CIP spectrum. Thus, while CIP was analyzed using univariate calibration, DEX analysis was carried out comparing partial least squares (PLS-1) and multiple linear regression (MLR). The latest required a previous variable selection step, which was performed using the genetic algorithm (GA) and the successive projections algorithm (SPA). The FBA system made it possible to automatically prepare the calibration and validation sets. The statistical parameters, in terms of relative errors of calibration and prediction, were acceptable for the determination of both CIP and DEX. Also, a comparative study of chemometric models was carried out. Commercial samples were analyzed and the obtained results are in close agreement with HPLC pharmacopeia methods. The joint interval test for the slope and the intercept was used to test for the presence of bias. There were no statistical differences between the proposed method and the reference method (α=0.05). The sample throughput was 10h(-1). The combination of automation and chemometric tools allows us to develop an environmental friendly method for the quality control of CIP and DEX in pharmaceuticals.

  8. QSAR study of 4-aryl-4H-chromenes as a new series of apoptosis inducers using different chemometric tools.

    Science.gov (United States)

    Khoshneviszadeh, Mehdi; Edraki, Najmeh; Miri, Ramin; Foroumadi, Alireza; Hemmateenejad, Bahram

    2012-04-01

    The apoptosis-inducing activity data of a series of 4-aryl-4H-chromenes based on three cell lines (human breast cancer cell line T47D, human non-smal cell lung cancer cell line H1299, and human colorectal cancer cell line DLD-1) have been subjected to quantitative structure-activity relationship (QSAR) analysis. A collection of chemometrics methods including multiple linear regression (MLR), factor analysis-based multiple linear regression (FA-MLR), principal component regression (PCR), and partial least squared combined with genetic algorithm for variable selection (GA-PLS) were employed to make connections between structural parameters and induction of apoptosis in three different cell lines. Models of high statistical qualities were obtained for each cell line using GA-PLS method. The results revealed that 2D autocorrelation descriptors and dipole moments as a quantum chemical parameter are important structural parameters that significantly influence the activity in all three types of cell lines. However, the determinant descriptors for activity of compounds in H1299 cell line were partly different from the two other cell lines, which might be deduce that the studied compounds induce apoptosis through a different mechanism of action.

  9. Fingerprints for main varieties of argentinean wines: terroir differentiation by inorganic, organic, and stable isotopic analyses coupled to chemometrics.

    Science.gov (United States)

    Di Paola-Naranjo, Romina D; Baroni, Maria V; Podio, Natalia S; Rubinstein, Hector R; Fabani, Maria P; Badini, Raul G; Inga, Marcela; Ostera, Hector A; Cagnoni, Mariana; Gallegos, Ernesto; Gautier, Eduardo; Peral-Garcia, Pilar; Hoogewerff, Jurian; Wunderlin, Daniel A

    2011-07-27

    Our main goal was to investigate if robust chemical fingerprints could be developed for three Argentinean red wines based on organic, inorganic, and isotopic patterns, in relation to the regional soil composition. Soils and wines from three regions (Mendoza, San Juan, and Córdoba) and three varieties (Cabernet Sauvignon, Malbec, and Syrah) were collected. The phenolic profile was determined by HPLC-MS/MS and multielemental composition by ICP-MS; (87)Sr/(86)Sr and δ(13)C were determined by TIMS and IRMS, respectively. Chemometrics allowed robust differentiation between regions, wine varieties, and the same variety from different regions. Among phenolic compounds, resveratrol concentration was the most useful marker for wine differentiation, whereas Mg, K/Rb, Ca/Sr, and (87)Sr/(86)Sr were the main inorganic and isotopic parameters selected. Generalized Procrustes analysis (GPA) using two studied matrices (wine and soil) shows consensus between them and clear differences between studied areas. Finally, we applied a canonical correlation analysis, demonstrating significant correlation (r = 0.99; p soil and wine composition. To our knowledge this is the first report combining independent variables, constructing a fingerprint including elemental composition, isotopic, and polyphenol patterns to differentiate wines, matching part of this fingerprint with the soil provenance.

  10. COMPARISON OF SELECTED PHYSIOLOGICAL VARIABLES OF PLAYERS BELONGING TO VARIOUS DISTANCE RUNNERS

    Directory of Open Access Journals (Sweden)

    Satpal Yadav

    2009-12-01

    Full Text Available The purpose of the study was to compare the selected physiological variables namely; maximum oxygen consumption, vital capacity, resting heart rate and hemoglobin content among various distance runners. Thesubjects were selected from the male athlete’s of Gwalior district of various distance runners i.e. short, middle and long distance runners for this study. Ten (10 male athletes from each groups namely short, middle and long distance groups were selected as the subject for the study. Selected physiological variables such as maximum oxygen consumption, vital capacity, resting heart rate and hemoglobin content were presented to compare the players belonging to various distance runners namely short, middle and long distance. To see the significant difference of selected physiological variables among the players belonging to various distance runners the analysis of variance “F-ratio” was applied at.05 level of significance. For further analysis “Post-Hoc Test” (LSD Test was applied. The short distance runners had shown significantly different level of VO2 max (72.727 in comparison to middle distance (75.854 and long distance (77.094 runners. However, the middle and long distance runners had shown more or less same level of VO2. Further long distance runners had shown better efficiency of heart as its mean value (56.3 was lowest among all the three groups in relation to resting heartrate. On the other hand long, middle and short distance runners had shown more or less same vital capacity and hemoglobin content with a small range of variation.

  11. The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition.

    Science.gov (United States)

    Zhang, Ying; Bilbao, Aivett; Bruderer, Tobias; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard; Varesio, Emmanuel

    2015-10-02

    As tryptic peptides and metabolites are not equally distributed along the mass range, the probability of cross fragment ion interference is higher in certain windows when fixed Q1 SWATH windows are applied. We evaluated the benefits of utilizing variable Q1 SWATH windows with regards to selectivity improvement. Variable windows based on equalizing the distribution of either the precursor ion population (PIP) or the total ion current (TIC) within each window were generated by an in-house software, swathTUNER. These two variable Q1 SWATH window strategies outperformed, with respect to quantification and identification, the basic approach using a fixed window width (FIX) for proteomic profiling of human monocyte-derived dendritic cells (MDDCs). Thus, 13.8 and 8.4% additional peptide precursors, which resulted in 13.1 and 10.0% more proteins, were confidently identified by SWATH using the strategy PIP and TIC, respectively, in the MDDC proteomic sample. On the basis of the spectral library purity score, some improvement warranted by variable Q1 windows was also observed, albeit to a lesser extent, in the metabolomic profiling of human urine. We show that the novel concept of "scheduled SWATH" proposed here, which incorporates (i) variable isolation windows and (ii) precursor retention time segmentation further improves both peptide and metabolite identifications.

  12. Bioreactor monitoring with spectroscopy and chemometrics: a review.

    Science.gov (United States)

    Lourenço, N D; Lopes, J A; Almeida, C F; Sarraguça, M C; Pinheiro, H M

    2012-09-01

    Biotechnological processes are crucial to the development of any economy striving to ensure a relevant position in future markets. The cultivation of microorganisms in bioreactors is one of the most important unit operations of biotechnological processes, and real-time monitoring of bioreactors is essential for effective bioprocess control. In this review, published material on the potential application of different spectroscopic techniques for bioreactor monitoring is critically discussed, with particular emphasis on optical fiber technology, reported for in situ bioprocess monitoring. Application examples are presented by spectroscopy type, specifically focusing on ultraviolet-visible, near-infrared, mid-infrared, Raman, and fluorescence spectroscopy. The spectra acquisition devices available and the major advantages and disadvantages of each spectroscopy are discussed. The type of information contained in the spectra and the available chemometric methods for extracting that information are also addressed, including wavelength selection, spectra pre-processing, principal component analysis, and partial least-squares. Sample handling techniques (flow and sequential injection analysis) that include transport to spectroscopic sensors for ex-situ on-line monitoring are not covered in this review.

  13. Combining chromatography and chemometrics for the characterization and authentication of fats and oils from triacylglycerol compositional data--a review.

    Science.gov (United States)

    Bosque-Sendra, Juan M; Cuadros-Rodríguez, Luis; Ruiz-Samblás, Cristina; de la Mata, A Paulina

    2012-04-29

    The characterization and authentication of fats and oils is a subject of great importance for market and health aspects. Identification and quantification of triacylglycerols in fats and oils can be excellent tools for detecting changes in their composition due to the mixtures of these products. Most of the triacylglycerol species present in either fats or oils could be analyzed and identified by chromatographic methods. However, the natural variability of these samples and the possible presence of adulterants require the application of chemometric pattern recognition methods to facilitate the interpretation of the obtained data. In view of the growing interest in this topic, this paper reviews the literature of the application of exploratory and unsupervised/supervised chemometric methods on chromatographic data, using triacylglycerol composition for the characterization and authentication of several foodstuffs such as olive oil, vegetable oils, animal fats, fish oils, milk and dairy products, cocoa and coffee.

  14. Tracheal size variability is associated with sex: implications for endotracheal tube selection.

    Science.gov (United States)

    Karmakar, Arunabha; Pate, Mariah B; Solowski, Nancy L; Postma, Gregory N; Weinberger, Paul M

    2015-02-01

    Whereas selection of endotracheal tube (ETT) size in pediatric patients benefits from predictive nomograms, adult ETT sizing is relatively arbitrary. We sought to determine associations between cervical tracheal cross-sectional area (CTCSA) and clinical variables. One hundred thirty-two consecutive patients undergoing noncontrasted chest computed tomography (CT) at a single tertiary care institution from January 2010 to June 2011 were reviewed. Patients with improper CT technique, endotracheal intubation, and pulmonary/tracheal pathology were excluded. Tracheal luminal diameters in anteroposterior (D1) and transverse (D2) were measured 2 cm inferior to the cricoid and used to determine CTCSA = π*D1*D2*¼. The demographic variables of age, height, weight, and body mass index (BMI) were tested for association with CTCSA by Spearman correlation. Wilcoxon rank-sum test was used to compare CTCSA by race and sex. Multivariate linear regression was performed including all clinical variables. There were 91 patients who met inclusion criteria. There was no correlation between age, weight, or BMI and CTCSA. There was a significant positive correlation between patient height and CTCSA (P = .001, R = 0.35); however, this was confounded by sex. Female patients had significantly smaller CTCSA (mean = 241 mm(2)) compared to male patients (mean = 349 mm(2), P < .001). Multivariate linear regression stratified by sex revealed that height is correlated with CTCSA only in males (P = .028). Males also had more variability in CTCSA (SD 118.6) compared to females (SD 65.5). Our data suggest that selection of ETT size in male patients should include height as a predictive factor. For female patients, it may be appropriate to select a uniformly smaller diameter ETT size. © The Author(s) 2014.

  15. Social variables exert selective pressures in the evolution and form of primate mimetic musculature.

    Science.gov (United States)

    Burrows, Anne M; Li, Ly; Waller, Bridget M; Micheletta, Jerome

    2016-04-01

    Mammals use their faces in social interactions more so than any other vertebrates. Primates are an extreme among most mammals in their complex, direct, lifelong social interactions and their frequent use of facial displays is a means of proximate visual communication with conspecifics. The available repertoire of facial displays is primarily controlled by mimetic musculature, the muscles that move the face. The form of these muscles is, in turn, limited by and influenced by phylogenetic inertia but here we use examples, both morphological and physiological, to illustrate the influence that social variables may exert on the evolution and form of mimetic musculature among primates. Ecomorphology is concerned with the adaptive responses of morphology to various ecological variables such as diet, foliage density, predation pressures, and time of day activity. We present evidence that social variables also exert selective pressures on morphology, specifically using mimetic muscles among primates as an example. Social variables include group size, dominance 'style', and mating systems. We present two case studies to illustrate the potential influence of social behavior on adaptive morphology of mimetic musculature in primates: (1) gross morphology of the mimetic muscles around the external ear in closely related species of macaque (Macaca mulatta and Macaca nigra) characterized by varying dominance styles and (2) comparative physiology of the orbicularis oris muscle among select ape species. This muscle is used in both facial displays/expressions and in vocalizations/human speech. We present qualitative observations of myosin fiber-type distribution in this muscle of siamang (Symphalangus syndactylus), chimpanzee (Pan troglodytes), and human to demonstrate the potential influence of visual and auditory communication on muscle physiology. In sum, ecomorphologists should be aware of social selective pressures as well as ecological ones, and that observed morphology might

  16. Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study.

    Science.gov (United States)

    Wyss, Richard; Girman, Cynthia J; LoCasale, Robert J; Brookhart, Alan M; Stürmer, Til

    2013-01-01

    It is often preferable to simplify the estimation of treatment effects on multiple outcomes by using a single propensity score (PS) model. Variable selection in PS models impacts the efficiency and validity of treatment effects. However, the impact of different variable selection strategies on the estimated treatment effects in settings involving multiple outcomes is not well understood. The authors use simulations to evaluate the impact of different variable selection strategies on the bias and precision of effect estimates to provide insight into the performance of various PS models in settings with multiple outcomes. Simulated studies consisted of dichotomous treatment, two Poisson outcomes, and eight standard-normal covariates. Covariates were selected for the PS models based on their effects on treatment, a specific outcome, or both outcomes. The PSs were implemented using stratification, matching, and weighting (inverse probability treatment weighting). PS models including only covariates affecting a specific outcome (outcome-specific models) resulted in the most efficient effect estimates. The PS model that only included covariates affecting either outcome (generic-outcome model) performed best among the models that simultaneously controlled measured confounding for both outcomes. Similar patterns were observed over the range of parameter values assessed and all PS implementation methods. A single, generic-outcome model performed well compared with separate outcome-specific models in most scenarios considered. The results emphasize the benefit of using prior knowledge to identify covariates that affect the outcome when constructing PS models and support the potential to use a single, generic-outcome PS model when multiple outcomes are being examined. Copyright © 2012 John Wiley & Sons, Ltd.

  17. Genetic variability of rice recurrent selection populations as affected by male sterility or manual recombination

    Directory of Open Access Journals (Sweden)

    Letícia da Silveira Pinheiro

    2012-06-01

    Full Text Available The objective of this work was to determine the effect of male sterility or manual recombination on genetic variability of rice recurrent selection populations. The populations CNA-IRAT 4, with a gene for male sterility, and CNA 12, which was manually recombined, were evaluated. Genetic variability among selection cycles was estimated using14 simple sequence repeat (SSR markers. A total of 926 plants were analyzed, including ten genitors and 180 individuals from each of the evaluated cycles (1, 2 and 5 of the population CNA-IRAT 4, and 16 genitors and 180 individuals from each of the cycles (1 and 2 of CNA 12. The analysis allowed the identification of alleles not present among the genitors for both populations, in all cycles, especially for the CNA-IRAT 4 population. These alleles resulted from unwanted fertilization with genotypes that were not originally part of the populations. The parameters of Wright's F-statistic (F IS and F IT indicated that the manual recombination expands the genetic variability of the CNA 12 population, whereas male sterility reduces the one of CNA-IRAT 4.

  18. Effect of recurrent selection on the variability of the UENF-14 popcorn population

    Directory of Open Access Journals (Sweden)

    Rodrigo Moreira Ribeiro

    2016-07-01

    Full Text Available This study aimed to evaluate the effect of recurrent selection on the genetic variability of UENF-14 population after six selections. Two hundred and ten half-sib families were evaluated in two environments in the state of Rio de Janeiro, using incomplete randomized blocks design with treatments arranged in replication within “Sets”. There was significant effect for Families within the “Set” (F/S, proving that there is enough genetic variability to be exploited in the popcorn breeding program of UENF. The significance for the source of variation Environment (E shows that the environments were distinct enough to promote differences between the evaluated characteristics. It was found that for both characteristics of greatest interest, GY and PE, the magnitude of the additive variance remains with close values in advanced cycles of UENF-14 population, indicating that variability remains, with no evidence of decreases in advanced cycles. This is concluded by the longevity of UENF breeding program.

  19. Variable selection for distribution-free models for longitudinal zero-inflated count responses.

    Science.gov (United States)

    Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

    2016-07-20

    Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd.

  20. [Geographical origin discrimination of Auricularia auricula using variable selection method of modeling power].

    Science.gov (United States)

    Liu, Fei; Sun, Guang-ming; He, Yong

    2010-01-01

    Near infrared (NIR) spectroscopy combined with variable selection method of modeling power was investigated for the fast and accurate geographical origin discrimination of auricularia auricula. A total of 240 samples of auriculari auricula were collected in the market, and the spectra of all samples were scanned within the spectral region of 1100-2500 nm. The calibration set was composed of 180 (45 samples for each origin) samples, and the remaining 60 samples were employed as the validation set. The optimal partial least squares (PLS) discriminant model was achieved after performance comparison of different preprocessing (Savitzky-Golay smoothing, standard normal variate, 1-derivative, and 2-derivative). The effective wavelengths, which were selected by modeling power (MP) and used as input data matrix of least squares-support vector machine (LS-SVM), were employed for the development of modeling power-least squares-support vector machine (MP-LS-SVM) model. Radial basis function (RBF) kernel was applied as kernel function. Three threshold methods for variable selection by modeling power were applied in MP-LSSVM models, and there were the values of modeling power higher than 0.95, higher than 0.90, and higher than 0.90 combined with peak location (0.90+Peak). The correct recognition ratio in the validation set was used as evaluation standards. The absolute error of prediction was set as 0.1, 0.2 and 0.5, which showed the wrong recognition threshold value. The results indicated that the MP-LS-SVM (0.90+Peak) model could achieve the optimal performance in all three absolute error standards (0.1, 0.2 and 0.5), and the correct recognition ratio was 98.3%, 100% and 100%, respectively. The variable selection threshold (0.90+Peak) was the most suitable one in the application of modeling power. It was concluded that modeling power was an effective variable selection method, and near infrared spectroscopy combined with MP-LS-SVM model was successfully applied for the origin

  1. Laser-Induced Breakdown Spectroscopy Coupled with Multivariate Chemometrics for Variety Discrimination of Soil

    Science.gov (United States)

    Yu, Ke-Qiang; Zhao, Yan-Ru; Liu, Fei; He, Yong

    2016-01-01

    The aim of this work was to analyze the variety of soil by laser-induced breakdown spectroscopy (LIBS) coupled with chemometrics methods. 6 certified reference materials (CRMs) of soil samples were selected and their LIBS spectra were captured. Characteristic emission lines of main elements were identified based on the LIBS curves and corresponding contents. From the identified emission lines, LIBS spectra in 7 lines with high signal-to-noise ratio (SNR) were chosen for further analysis. Principal component analysis (PCA) was carried out using the LIBS spectra at 7 selected lines and an obvious cluster of 6 soils was observed. Soft independent modeling of class analogy (SIMCA) and least-squares support vector machine (LS-SVM) were introduced to establish discriminant models for classifying the 6 types of soils, and they offered the correct discrimination rates of 90% and 100%, respectively. Receiver operating characteristic (ROC) curve was used to evaluate the performance of models and the results demonstrated that the LS-SVM model was promising. Lastly, 8 types of soils from different places were gathered to conduct the same experiments for verifying the selected 7 emission lines and LS-SVM model. The research revealed that LIBS technology coupled with chemometrics could conduct the variety discrimination of soil. PMID:27279284

  2. Laser-Induced Breakdown Spectroscopy Coupled with Multivariate Chemometrics for Variety Discrimination of Soil

    Science.gov (United States)

    Yu, Ke-Qiang; Zhao, Yan-Ru; Liu, Fei; He, Yong

    2016-06-01

    The aim of this work was to analyze the variety of soil by laser-induced breakdown spectroscopy (LIBS) coupled with chemometrics methods. 6 certified reference materials (CRMs) of soil samples were selected and their LIBS spectra were captured. Characteristic emission lines of main elements were identified based on the LIBS curves and corresponding contents. From the identified emission lines, LIBS spectra in 7 lines with high signal-to-noise ratio (SNR) were chosen for further analysis. Principal component analysis (PCA) was carried out using the LIBS spectra at 7 selected lines and an obvious cluster of 6 soils was observed. Soft independent modeling of class analogy (SIMCA) and least-squares support vector machine (LS-SVM) were introduced to establish discriminant models for classifying the 6 types of soils, and they offered the correct discrimination rates of 90% and 100%, respectively. Receiver operating characteristic (ROC) curve was used to evaluate the performance of models and the results demonstrated that the LS-SVM model was promising. Lastly, 8 types of soils from different places were gathered to conduct the same experiments for verifying the selected 7 emission lines and LS-SVM model. The research revealed that LIBS technology coupled with chemometrics could conduct the variety discrimination of soil.

  3. Modulation depth estimation and variable selection in state-space models for neural interfaces.

    Science.gov (United States)

    Malik, Wasim Q; Hochberg, Leigh R; Donoghue, John P; Brown, Emery N

    2015-02-01

    Rapid developments in neural interface technology are making it possible to record increasingly large signal sets of neural activity. Various factors such as asymmetrical information distribution and across-channel redundancy may, however, limit the benefit of high-dimensional signal sets, and the increased computational complexity may not yield corresponding improvement in system performance. High-dimensional system models may also lead to overfitting and lack of generalizability. To address these issues, we present a generalized modulation depth measure using the state-space framework that quantifies the tuning of a neural signal channel to relevant behavioral covariates. For a dynamical system, we develop computationally efficient procedures for estimating modulation depth from multivariate data. We show that this measure can be used to rank neural signals and select an optimal channel subset for inclusion in the neural decoding algorithm. We present a scheme for choosing the optimal subset based on model order selection criteria. We apply this method to neuronal ensemble spike-rate decoding in neural interfaces, using our framework to relate motor cortical activity with intended movement kinematics. With offline analysis of intracortical motor imagery data obtained from individuals with tetraplegia using the BrainGate neural interface, we demonstrate that our variable selection scheme is useful for identifying and ranking the most information-rich neural signals. We demonstrate that our approach offers several orders of magnitude lower complexity but virtually identical decoding performance compared to greedy search and other selection schemes. Our statistical analysis shows that the modulation depth of human motor cortical single-unit signals is well characterized by the generalized Pareto distribution. Our variable selection scheme has wide applicability in problems involving multisensor signal modeling and estimation in biomedical engineering systems.

  4. Selection of Variable in Sampling Investigation%抽样调查中变量选择

    Institute of Scientific and Technical Information of China (English)

    陶凤梅; 杨启昌; 胡锡衡

    2002-01-01

    在抽样调查中,问卷的设计者常常尽可能多地设计变量,以保证不丢失有用的信息.但是,问卷中含有太多变量会减少问卷的回收率,进而导致分析结果.本文通过对应分析的方法介绍了幼儿主体性发展的变量选择,并分析了其合理性.%In sampling investigation,the designer of questionnaire usually attempt to collect the questions as many as possible,so as to avoid losing some useful information.Whereas,the questionnaire including too many questions might reduce the ratio of receiving answer and make trouble in analysing the investigation results.In this paper,we select the variables of questionnaire for infant activity development by the method of variable selection in correspondence analysis and analyze the rationality of the selection.

  5. NEW EFFICIENT ESTIMATION AND VARIABLE SELECTION METHODS FOR SEMIPARAMETRIC VARYING-COEFFICIENT PARTIALLY LINEAR MODELS.

    Science.gov (United States)

    Kai, Bo; Li, Runze; Zou, Hui

    2011-02-01

    The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data.

  6. Correlation structure and variable selection in generalized estimating equations via composite likelihood information criteria.

    Science.gov (United States)

    Nikoloulopoulos, Aristidis K

    2016-06-30

    The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Chemometric interpretation of heavy metal patterns in soils worldwide.

    Science.gov (United States)

    Skrbić, Biljana; Durisić-Mladenović, Natasa

    2010-09-01

    Principal component analysis (PCA) was applied on data sets containing levels of six heavy metals (Pb, Cu, Zn, Cd, Ni, Cr) in soils from different parts of the world in order to investigate the information captured in the global heavy metal patterns. Data used in this study consisted of the heavy metal contents determined in 23 soil samples from and around the Novi Sad city area in the Vojvodina Province, northern part of Serbia, together with those from the city of Banja Luka, the second largest city in Bosnia and Herzegovina, and the ones reported previously in the relevant literature in order to evaluate heavy metal distribution pattern in soils of different land-use types, as well as spatial and temporal differences in the patterns. The chemometric analysis was applied on the following input data sets: the overall set with all data gathered in this study containing 264 samples, and two sub sets obtained after dividing the overall set in accordance to the soil metal index, SMI, calculated here, i.e. the set of unpolluted soils having SMIs100%. Additionally, univariate descriptive statistics and the Spearman's non-parametric rank correlation coefficients were calculated for these three sets. A Box-Cox transformation was used as a data pretreatment before the statistical methods applied. According to the results, it was seen that anthropogenic and background sources had different impact on the data variability in the case of polluted and unpolluted soils. The sample discrimination regarding the land-use types was more evident for the unpolluted soils than for the polluted ones. Using linear discriminant analysis, content of Cu was determined as a variable with a major discriminant capacity. The correct classification of 73.3% was achieved for predefined land-use types. Classification of the samples in accordance to the pollution level expressed as SMI was necessary in order to avoid the "masking" effect of the polluted soil patterns over the non-polluted ones.

  8. Chemometrics optimization of six antihistamines separations by capillary electrophoresis with electrochemiluminescence detection.

    Science.gov (United States)

    Zhu, Derong; Li, Xia; Sun, Jinying; You, Tianyan

    2012-01-15

    This work expanded the knowledge of the use of chemometric experimental design in optimizing of six antihistamines separations by capillary electrophoresis with electrochemiluminescence detection. Specially, central composite design was employed for optimizing the three critical electrophoretic variables (Tris-H(3)PO(4) buffer concentration, buffer pH value and separation voltage) using the chromatography resolution statistic function (CRS function) as the response variable. The optimum conditions were established from empirical model: 24.2mM Tris-H(3)PO(4) buffer (pH 2.7) with separation voltage of 15.9 kV. Applying theses conditions, the six antihistamines (carbinoxamine, chlorpheniramine, cyproheptadine, doxylamine, diphenhydramine and ephedrine) could be simultaneous separated in less than 22 min. Our results indicate that the chemometrics optimization method can greatly simplify the optimization procedure for multi-component analysis. The proposed method was also validated for linearity, repeatability and sensitivity, and was successfully applied to determine these antihistamine drugs in urine.

  9. Effect of Selected Organic Acids on Cadmium Sorption by Variable-and Permanent-Charge Soils

    Institute of Scientific and Technical Information of China (English)

    HU Hong-Qing; LIU Hua-Liang; HE Ji-Zheng; HUANG Qiao-Yun

    2007-01-01

    Batch equilibrium experiments were conducted to investigate cadmium (Cd) sorption by two permanent-charge soils, a yellow-cinnamon soil and a yellow-brown soil, and two variable-charge soils, a red soil and a latosol, with addition of selected organic acids (acetate, tartrate, and citrate). Results showed that with an increase in acetate concentrations from 0 to 3.0 mmol L-1, Cd sorption percentage by the yellow-cinnamon soil, the yellow-brown soil, and the latosol decreased. The sorption percentage of Cd by the yellow-cinnamon soil and generally the yellow-brown soil (permanent-charge soils)decreased with an increase in tartrate concentration, but increased at low tartrate concentrations for the red soil and the latosol. Curves of percentage of Cd sorption for citrate were similar to those for tartrate. For the variable-charge soils with tartrate and citrate, there were obvious peaks in Cd sorption percentage. These peaks, where organic acids had maximum influence, changed with soil type, and were at a higher organic acid concentration for the variable-charge soils than for the permanent charge soils. Addition of cadmium after tartrate adsorption resulted in higher sorption increase for the variable-charge soils than permanent-charge soils. When tartrate and Cd solution were added together, sorption of Cd decreased with tartrate concentration for the yellow-brown soil, but increased at low tartrate concentrations and then decreased with tartrate concentration for the red soil and the latosol.

  10. Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech

    Directory of Open Access Journals (Sweden)

    Tian-Swee Tan

    2008-01-01

    Full Text Available Problem statement: The main problem with current Malay Text-To-Speech (MTTS synthesis system is the poor quality of the generated speech sound due to the inability of traditional TTS system to provide multiple choices of unit for generating more accurate synthesized speech. Approach: This study proposes a phonetic context variable length unit selection MTTS system that is capable of providing more natural and accurate unit selection for synthesized speech. It implemented a phonetic context algorithm for unit selection for MTTS. The unit selection method (without phonetic context may encounter the problem of selecting the speech unit from different sources and affect the quality of concatenation. This study proposes the design of speech corpus and unit selection method according to phonetic context so that it can select a string of continuous phoneme from same source instead of individual phoneme from different sources. This can further reduce the concatenation point and increase the quality of concatenation. The speech corpus was transcribed according to phonetic context to preserve the phonetic information. This method utilizes word base concatenation method. Firstly it will search through the speech corpus for the target word, if the target is found; it will be used for concatenation. If the word does not exist, then it will construct the words from phoneme sequence. Results: This system had been tested with 40 participants in Mean Opinion Score (MOS listening test with the average rates for naturalness, pronunciation and intelligibility are 3.9, 4.1 and 3.9. Conclusion/Recommendation: Through this study, a very first version of Corpus-based MTTS has been designed; it has improved the naturalness, pronunciation and intelligibility of synthetic speech. But it still has some lacking that need to be perfected such as the prosody module to support the phrasing analysis and intonation of input text to match with the waveform modifier.

  11. A scalable and portable framework for massively parallel variable selection in genetic association studies.

    Science.gov (United States)

    Chen, Gary K

    2012-03-01

    The deluge of data emerging from high-throughput sequencing technologies poses large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units. Open source code and documentation can be found at a Google Code repository under the URL http://bioinformatics.oxfordjournals.org/content/early/2012/01/10/bioinformatics.bts015.abstract. Supplementary data are available at Bioinformatics online.

  12. Latent Variable Selection for Multidimensional Item Response Theory Models via [Formula: see text] Regularization.

    Science.gov (United States)

    Sun, Jianan; Chen, Yunxiao; Liu, Jingchen; Ying, Zhiliang; Xin, Tao

    2016-12-01

    We develop a latent variable selection method for multidimensional item response theory models. The proposed method identifies latent traits probed by items of a multidimensional test. Its basic strategy is to impose an [Formula: see text] penalty term to the log-likelihood. The computation is carried out by the expectation-maximization algorithm combined with the coordinate descent algorithm. Simulation studies show that the resulting estimator provides an effective way in correctly identifying the latent structures. The method is applied to a real dataset involving the Eysenck Personality Questionnaire.

  13. An Approach with Support Vector Machine using Variable Features Selection on Breast Cancer Prognosis

    Directory of Open Access Journals (Sweden)

    Sandeep Chaurasia

    2013-09-01

    Full Text Available Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of machine learning. In this paper we have used an approach by using support vector machine classifier to construct a model that is useful for the breast cancer survivability prediction. We have used both 5 cross and 10 cross validation of variable selection on input feature vectors and the performance measurement through bio-learning class performance while measuring AUC, specificity and sensitivity. The performance of the SVM is much better than the other machine learning classifier.

  14. Adaptive variable selection for extended Nijboer-Zernike aberration retrieval via lasso

    Science.gov (United States)

    Wang, Bin; Diao, Huai-An; Guo, Jianhua; Liu, Xiyang; Wu, Yuanhao

    2017-02-01

    In this paper, we propose extended Nijboer-Zernike (ENZ) method for aberration retrieval by incorporating lasso variable selection method which can improve the accuracy of aberration retrieval. The proposed model is computed by the state-of-art algorithm of the Bregman iterative algorithm (Bregman, 1967 [1]; Cai et al., 2008 [2]; Yin et al., 2008 [3]) for L1 minimization problem with adaptive regularized parameter choice based on the strategy (Ito et al., 2011 [4]). Numerical simulations for real world and simulated phase data validate the effectiveness of the proposed ENZ AR via lasso.

  15. Soft Sensing Modelling Based on Optimal Selection of Secondary Variables and Its Application

    Institute of Scientific and Technical Information of China (English)

    Qi Li; Cheng Shao

    2009-01-01

    The composition of the distillation column is a very important quality value in refineries, unfortunately, few hardware sensors are available on-line to measure the distillation compositions. In this paper, a novel method using sensitivity matrix analysis and kernel ridge regression (KRR) to implement on-line soft sensing of distillation compositions is proposed. In this approach, the sensitivity matrix analysis is presented to select the most suitable secondary variables to be used as the soft sensor's input. The KRR is used to build the composition soft sensor. Application to a simulated distillation column demonstrates the effectiveness of the method.

  16. Selection of single chain variable fragments specific for the human-inducible costimulator using ribosome display.

    Science.gov (United States)

    Pan, Yangbin; Mao, Weiping; Liu, Xuanxuan; Xu, Chong; He, Zhijuan; Wang, Wenqian; Yan, Hao

    2012-11-01

    We applied a ribosome display technique to a mouse single chain variable fragment (scFv) library to select scFvs specific for the inducible costimulator (ICOS). mRNA was isolated from the spleens of BALB/c mice immunized with ICOS protein. Heavy and κ chain genes (VH and κ) were amplified separately by reverse transcriptase polymerase chain reaction, and the anti-ICOS VH/κ chain ribosome display library was constructed with a special flexible linker by overlap extension PCR. The VH/κ chain library was transcribed and translated in vitro using a rabbit reticulocyte lysate system. Then, antibody-ribosome-mRNA complexes were produced and panned against ICOS protein under appropriate conditions. However, in order to isolate specific scFvs for ICOS, negative selection using CD28 was carried out before three rounds of positive selection on ICOS. After three rounds of panning, the selected scFv DNAs were cloned into pET43.1a and detected by SDS-PAGE. Then, enzyme-linked immunosorbent assay showed that we successfully constructed a native ribosome display library, and among seven clones, clone 5 had the highest affinity for the ICOS and low for the CD28. Anti-ICOS scFvs are assessed for binding specificity and affinity and may provide the potential for development of the humanized and acute and chronic allograft rejection.

  17. Improved chemometric methodologies for the assessment of soil carbon sequestration mechanisms

    Science.gov (United States)

    Jiménez-González, Marco A.; Almendros, Gonzalo; Álvarez, Ana M.; González-Vila, Francisco J.

    2016-04-01

    The factors involved soil C sequestration, which is reflected in the highly variable content of organic matter in the soils, are not yet well defined. Therefore, their identification is crucial for understanding Earth's biogeochemical cycle and global change. The main objective of this work is to contribute to a better qualitative and quantitative assessment of the mechanisms of organic C sequestration in the soil, using omic approaches not requiring the detailed knowledge of the structure of the material under study. With this purpose, we have carried out a series of chemometric approaches on a set of widely differing soils (35 representative ecosystems). In an exploratory phase, we used multivariate statistical models (e.g., multidimensional scaling, discriminant analysis with automatic backward variable selection…) to analyze arrays of more than 200 independent soil variables (physicochemical, spectroscopic, pyrolytic...) in order to select those factors (descriptors or proxies) that explain most of the total system variance (content and stability of the different C forms). These models showed that the factors determining the stabilization of organic material are greatly dependent on the soil type. In some cases, the molecular structure of organic matter seemed strongly correlated with their resilience, while in other soil types the organo-mineral interactions played a significant bearing on the accumulation of selectively preserved C forms. In any case, it was clear that the factors driving the resilience of organic matter are manifold and not exclusive. Consequently, in a second stage, prediction models of the soil C content and their biodegradability (laboratory incubation experiments) were carried out by massive data processing by partial least squares (PLS) regression of data from Py-GC-MS and Py-MS. In some models, PLS was applied to a matrix of 150 independent variables corresponding to major pyrolysis compounds (peak areas) from the 35 samples of whole

  18. Bayesian Variable Selection and Computation for Generalized Linear Models with Conjugate Priors.

    Science.gov (United States)

    Chen, Ming-Hui; Huang, Lan; Ibrahim, Joseph G; Kim, Sungduk

    2008-07-01

    In this paper, we consider theoretical and computational connections between six popular methods for variable subset selection in generalized linear models (GLM's). Under the conjugate priors developed by Chen and Ibrahim (2003) for the generalized linear model, we obtain closed form analytic relationships between the Bayes factor (posterior model probability), the Conditional Predictive Ordinate (CPO), the L measure, the Deviance Information Criterion (DIC), the Aikiake Information Criterion (AIC), and the Bayesian Information Criterion (BIC) in the case of the linear model. Moreover, we examine computational relationships in the model space for these Bayesian methods for an arbitrary GLM under conjugate priors as well as examine the performance of the conjugate priors of Chen and Ibrahim (2003) in Bayesian variable selection. Specifically, we show that once Markov chain Monte Carlo (MCMC) samples are obtained from the full model, the four Bayesian criteria can be simultaneously computed for all possible subset models in the model space. We illustrate our new methodology with a simulation study and a real dataset.

  19. Locating disease genes using Bayesian variable selection with the Haseman-Elston method

    Directory of Open Access Journals (Sweden)

    He Qimei

    2003-12-01

    Full Text Available Abstract Background We applied stochastic search variable selection (SSVS, a Bayesian model selection method, to the simulated data of Genetic Analysis Workshop 13. We used SSVS with the revisited Haseman-Elston method to find the markers linked to the loci determining change in cholesterol over time. To study gene-gene interaction (epistasis and gene-environment interaction, we adopted prior structures, which incorporate the relationship among the predictors. This allows SSVS to search in the model space more efficiently and avoid the less likely models. Results In applying SSVS, instead of looking at the posterior distribution of each of the candidate models, which is sensitive to the setting of the prior, we ranked the candidate variables (markers according to their marginal posterior probability, which was shown to be more robust to the prior. Compared with traditional methods that consider one marker at a time, our method considers all markers simultaneously and obtains more favorable results. Conclusions We showed that SSVS is a powerful method for identifying linked markers using the Haseman-Elston method, even for weak effects. SSVS is very effective because it does a smart search over the entire model space.

  20. A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes.

    Science.gov (United States)

    Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed

    2008-12-01

    A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.

  1. Spatial variability of selected physicochemical parameters within peat deposits in small valley mire: a geostatistical approach

    Directory of Open Access Journals (Sweden)

    Pawłowski Dominik

    2014-12-01

    Full Text Available Geostatistical methods for 2D and 3D modelling spatial variability of selected physicochemical properties of biogenic sediments were applied to a small valley mire in order to identify the processes that lead to the formation of various types of peat. A sequential Gaussian simulation was performed to reproduce the statistical distribution of the input data (pH and organic matter and their semivariances, as well as to honouring of data values, yielding more ‘realistic’ models that show microscale spatial variability, despite the fact that the input sample cores were sparsely distributed in the X-Y space of the study area. The stratigraphy of peat deposits in the Ldzań mire shows a record of long-term evolution of water conditions, which is associated with the variability in water supply over time. Ldzań is a fen (a rheotrophic mire with a through-flow of groundwater. Additionally, the vicinity of the Grabia River is marked by seasonal inundations of the southwest part of the mire and increased participation of mineral matter in the peat. In turn, the upper peat layers of some of the central part of Ldzań mire are rather spongy, and these peat-forming phytocoenoses probably formed during permanent waterlogging.

  2. Correlated response in litter size components in rabbits selected for litter size variability.

    Science.gov (United States)

    Argente, M J; Calle, E W; García, M L; Blasco, A

    2017-07-11

    A divergent selection experiment for the environmental variability of litter size (Ve) over seven generations was carried out in rabbits at the University Miguel Hernández of Elche. The Ve was estimated as the phenotypic variance within the female, after correcting for year-season and parity-lactation status. The aim of this study was to analyse the correlated responses to selection in litter size components. The ovulation rate (OR) and number of implanted embryos (IE) in females were measured by laparoscopy at 12 day of the second gestation. At the end of the second gestation, the total number of kits born was measured (TB). Embryonic (ES), foetal (FS) and prenatal (PS) survival were computed as IE/OR, TB/IE and TB/OR, respectively. A total of 405 laparoscopies were performed. Data were analysed using Bayesian methodology. The correlated response to selection for litter size environmental variability in terms of the litter size components was estimated as either genetic trends, estimated by computing the average estimated breeding values for each generation and each line, or the phenotypic differences between lines. The OR was similar in both lines. However, after seven generations of selection, the homogenous line showed more IE (1.09 embryos for genetic means and 1.23 embryos for phenotypic means) and higher ES than the heterogeneous one (0.07 for genetic means and 0.08 for phenotypic means). The probability of the phenotypic differences between lines being higher than zero (p) was 1.00 and .99, respectively. A higher uterine overcrowding of embryos in the homogeneous line did not penalize FS; as a result, this line continued to show a greater TB (1.01 kits for genetic means and 1.30 kits for phenotypic means, p = .99, in the seventh generation). In conclusion, a decrease in litter size variability showed a favourable effect on ES and led to a higher litter size at birth. © 2017 Blackwell Verlag GmbH.

  3. Selection and affinity maturation of IgNAR variable domains targeting Plasmodium falciparum AMA1.

    Science.gov (United States)

    Nuttall, Stewart D; Humberstone, Karen S; Krishnan, Usha V; Carmichael, Jennifer A; Doughty, Larissa; Hattarki, Meghan; Coley, Andrew M; Casey, Joanne L; Anders, Robin F; Foley, Michael; Irving, Robert A; Hudson, Peter J

    2004-04-01

    The new antigen receptor (IgNAR) is an antibody unique to sharks and consists of a disulphide-bonded dimer of two protein chains, each containing a single variable and five constant domains. The individual variable (V(NAR)) domains bind antigen independently, and are candidates for the smallest antibody-based immune recognition units. We have previously produced a library of V(NAR) domains with extensive variability in the CDR1 and CDR3 loops displayed on the surface of bacteriophage. Now, to test the efficacy of this library, and further explore the dynamics of V(NAR) antigen binding we have performed selection experiments against an infectious disease target, the malarial Apical Membrane Antigen-1 (AMA1) from Plasmodium falciparum. Two related V(NAR) clones were selected, characterized by long (16- and 18-residue) CDR3 loops. These recombinant V(NAR)s could be harvested at yields approaching 5mg/L of monomeric protein from the E. coli periplasm, and bound AMA1 with nanomolar affinities (K(D)= approximately 2 x 10(-7) M). One clone, designated 12Y-2, was affinity-matured by error prone PCR, resulting in several variants with mutations mapping to the CDR1 and CDR3 loops. The best of these variants showed approximately 10-fold enhanced affinity over 12Y-2 and was Plasmodium falciparum strain-specific. Importantly, we demonstrated that this monovalent V(NAR) co-localized with rabbit anti-AMA1 antisera on the surface of malarial parasites and thus may have utility in diagnostic applications.

  4. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  5. A variant of sparse partial least squares for variable selection and data exploration

    Directory of Open Access Journals (Sweden)

    Megan Jodene Olson Hunt

    2014-03-01

    Full Text Available When data are sparse and/or predictors multicollinear, current implementation of sparse partial least squares (SPLS does not give estimates for non-selected predictors nor provide a measure of inference. In response, an approach termed all-possible SPLS is proposed, which fits a SPLS model for all tuning parameter values across a set grid. Noted is the percentage of time a given predictor is chosen, as well as the average non-zero parameter estimate. Using a large number of multicollinear predictors, simulation confirmed variables not associated with the outcome were least likely to be chosen as sparsity increased across the grid of tuning parameters, while the opposite was true for those strongly associated. Lastly, variables with a weak association were chosen more often than those with no association, but less often than those with a strong relationship to the outcome. Similarly, predictors most strongly related to the outcome had the largest average parameter estimate magnitude, followed by those with a weak relationship, followed by those with no relationship. Across two independent studies regarding the relationship between volumetric MRI measures and a cognitive test score, this method confirmed a priori hypotheses about which brain regions would be selected most often and have the largest average parameter estimates. In conclusion, the percentage of time a predictor is chosen is a useful measure for ordering the strength of the relationship between the independent and dependent variables, serving as a form of inference. The average parameter estimates give further insight regarding the direction and strength of association. As a result, all-possible SPLS gives more information than the dichotomous output of traditional SPLS, making it useful when undertaking data exploration and hypothesis generation for a large number of potential predictors.

  6. A variant of sparse partial least squares for variable selection and data exploration.

    Science.gov (United States)

    Olson Hunt, Megan J; Weissfeld, Lisa; Boudreau, Robert M; Aizenstein, Howard; Newman, Anne B; Simonsick, Eleanor M; Van Domelen, Dane R; Thomas, Fridtjof; Yaffe, Kristine; Rosano, Caterina

    2014-01-01

    When data are sparse and/or predictors multicollinear, current implementation of sparse partial least squares (SPLS) does not give estimates for non-selected predictors nor provide a measure of inference. In response, an approach termed "all-possible" SPLS is proposed, which fits a SPLS model for all tuning parameter values across a set grid. Noted is the percentage of time a given predictor is chosen, as well as the average non-zero parameter estimate. Using a "large" number of multicollinear predictors, simulation confirmed variables not associated with the outcome were least likely to be chosen as sparsity increased across the grid of tuning parameters, while the opposite was true for those strongly associated. Lastly, variables with a weak association were chosen more often than those with no association, but less often than those with a strong relationship to the outcome. Similarly, predictors most strongly related to the outcome had the largest average parameter estimate magnitude, followed by those with a weak relationship, followed by those with no relationship. Across two independent studies regarding the relationship between volumetric MRI measures and a cognitive test score, this method confirmed a priori hypotheses about which brain regions would be selected most often and have the largest average parameter estimates. In conclusion, the percentage of time a predictor is chosen is a useful measure for ordering the strength of the relationship between the independent and dependent variables, serving as a form of inference. The average parameter estimates give further insight regarding the direction and strength of association. As a result, all-possible SPLS gives more information than the dichotomous output of traditional SPLS, making it useful when undertaking data exploration and hypothesis generation for a large number of potential predictors.

  7. Selection of key ambient particulate variables for epidemiological studies - applying cluster and heatmap analyses as tools for data reduction.

    Science.gov (United States)

    Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef

    2012-10-01

    The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses

  8. Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications.

    Science.gov (United States)

    Murphy, Thomas Brendan; Dean, Nema; Raftery, Adrian E

    2010-03-01

    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.

  9. A Classification Study of Respiratory Syncytial Virus (RSV Inhibitors by Variable Selection with Random Forest

    Directory of Open Access Journals (Sweden)

    Shuwei Zhang

    2011-02-01

    Full Text Available Experimental pEC50s for 216 selective respiratory syncytial virus (RSV inhibitors are used to develop classification models as a potential screening tool for a large library of target compounds. Variable selection algorithm coupled with random forests (VS-RF is used to extract the physicochemical features most relevant to the RSV inhibition. Based on the selected small set of descriptors, four other widely used approaches, i.e., support vector machine (SVM, Gaussian process (GP, linear discriminant analysis (LDA and k nearest neighbors (kNN routines are also employed and compared with the VS-RF method in terms of several of rigorous evaluation criteria. The obtained results indicate that the VS-RF model is a powerful tool for classification of RSV inhibitors, producing the highest overall accuracy of 94.34% for the external prediction set, which significantly outperforms the other four methods with the average accuracy of 80.66%. The proposed model with excellent prediction capacity from internal to external quality should be important for screening and optimization of potential RSV inhibitors prior to chemical synthesis in drug development.

  10. Comparison of objective Bayes factors for variable selection in parametric regression models for survival analysis.

    Science.gov (United States)

    Cabras, Stefano; Castellanos, Maria Eugenia; Perra, Silvia

    2014-11-20

    This paper considers the problem of selecting a set of regressors when the response variable is distributed according to a specified parametric model and observations are censored. Under a Bayesian perspective, the most widely used tools are Bayes factors (BFs), which are undefined when improper priors are used. In order to overcome this issue, fractional (FBF) and intrinsic (IBF) BFs have become common tools for model selection. Both depend on the size, Nt , of a minimal training sample (MTS), while the IBF also depends on the specific MTS used. In the case of regression with censored data, the definition of an MTS is problematic because only uncensored data allow to turn the improper prior into a proper posterior and also because full exploration of the space of the MTSs, which includes also censored observations, is needed to avoid bias in model selection. To address this concern, a sequential MTS was proposed, but it has the drawback of an increase of the number of possible MTSs as Nt becomes random. For this reason, we explore the behaviour of the FBF, contextualizing its definition to censored data. We show that these are consistent, providing also the corresponding fractional prior. Finally, a large simulation study and an application to real data are used to compare IBF, FBF and the well-known Bayesian information criterion.

  11. Chemometrics applications in biotechnology processes: predicting column integrity and impurity clearance during reuse of chromatography resin.

    Science.gov (United States)

    Rathore, Anurag S; Mittal, Shachi; Lute, Scott; Brorson, Kurt

    2012-01-01

    Separation media, in particular chromatography media, is typically one of the major contributors to the cost of goods for production of a biotechnology therapeutic. To be cost-effective, it is industry practice that media be reused over several cycles before being discarded. The traditional approach for estimating the number of cycles a particular media can be reused for involves performing laboratory scale experiments that monitor column performance and carryover. This dataset is then used to predict the number of cycles the media can be used at manufacturing scale (concurrent validation). Although, well accepted and widely practiced, there are challenges associated with extrapolating the laboratory scale data to manufacturing scale due to differences that may exist across scales. Factors that may be different include: level of impurities in the feed material, lot to lot variability in feedstock impurities, design of the column housing unit with respect to cleanability, and homogeneity of the column packing. In view of these challenges, there is a need for approaches that may be able to predict column underperformance at the manufacturing scale over the product lifecycle. In case such an underperformance is predicted, the operators can unpack and repack the chromatography column beforehand and thus avoid batch loss. Chemometrics offers one such solution. In this article, we present an application of chemometrics toward the analysis of a set of chromatography profiles with the intention of predicting the various events of column underperformance including the backpressure buildup and inefficient deoxyribonucleic acid clearance.

  12. Suitable classification of mortars from ancient Roman and Renaissance frescoes using thermal analysis and chemometrics.

    Science.gov (United States)

    Tomassetti, Mauro; Marini, Federico; Campanella, Luigi; Positano, Matteo; Marinucci, Francesco

    2015-01-01

    Literature on mortars has mainly focused on the identification and characterization of their components in order to assign them to a specific historical period, after accurate classification. For this purpose, different analytical techniques have been proposed. Aim of the present study was to verify whether the combination of thermal analysis and chemometric methods could be used to obtain a fast but correct classification of ancient mortar samples of different ages (Roman era and Renaissance). Ancient Roman frescoes from Museo Nazionale Romano (Terme di Diocleziano, Rome, Italy) and Renaissance frescoes from Sistine Chapel and Old Vatican Rooms (Vatican City) were analyzed by thermogravimetry (TG) and differential thermal analysis (DTA). Principal Component analysis (PCA) on the main thermal data evidenced the presence of two clusters, ascribable to the two different ages. Inspection of the loadings allowed to interpret the observed differences in terms of the experimental variables. PCA allowed differentiating the two kinds of mortars (Roman and Renaissance frescoes), and evidenced how the ancient Roman samples are richer in binder (calcium carbonate) and contain less filler (aggregate) than the Renaissance ones. It was also demonstrated how the coupling of thermoanalytical techniques and chemometric processing proves to be particularly advantageous when a rapid and correct differentiation and classification of cultural heritage samples of various kinds or ages has to be carried out. Graphical abstractPCA analysis of TG data allows differentiating mortar samples from different ages (Roman era and Renaissance).

  13. CHEMOMETRICS IN BIOANALYTICAL SAMPLE PREPARATION - A FRACTIONATED COMBINED MIXTURE AND FACTORIAL DESIGN FOR THE MODELING OF THE RECOVERY OF 5 TRICYCLIC AMINES FROM PLASMA AFTER LIQUID-LIQUID-EXTRACTION PRIOR TO HIGH-PERFORMANCE LIQUID-CHROMATOGRAPHY

    NARCIS (Netherlands)

    WIELING, J; MENSINK, CK; JONKMAN, JHG; COENEGRACHT, PMJ; DUINEVELD, CAA; DOORNBOS, DA

    1993-01-01

    A general systematic approach is described for the chemometric modelling of liquid-liquid extraction data of drugs from biological fluids. Extraction solvents were selected from Snyder's solvent selectivity triangle: methyl tert.-butyl ether, methylene chloride and chloroform. The composition of a m

  14. CHEMOMETRICS IN BIOANALYTICAL SAMPLE PREPARATION - A FRACTIONATED COMBINED MIXTURE AND FACTORIAL DESIGN FOR THE MODELING OF THE RECOVERY OF 5 TRICYCLIC AMINES FROM PLASMA AFTER LIQUID-LIQUID-EXTRACTION PRIOR TO HIGH-PERFORMANCE LIQUID-CHROMATOGRAPHY

    NARCIS (Netherlands)

    WIELING, J; MENSINK, CK; JONKMAN, JHG; COENEGRACHT, PMJ; DUINEVELD, CAA; DOORNBOS, DA

    1993-01-01

    A general systematic approach is described for the chemometric modelling of liquid-liquid extraction data of drugs from biological fluids. Extraction solvents were selected from Snyder's solvent selectivity triangle: methyl tert.-butyl ether, methylene chloride and chloroform. The composition of a

  15. [Application of characteristic NIR variables selection in portable detection of soluble solids content of apple by near infrared spectroscopy].

    Science.gov (United States)

    Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang

    2014-10-01

    In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.

  16. Application of headspace sorptive extraction and gas chromatographic/mass spectrometric and chemometric methods to the quantification of pine nuts and pecorino in Pesto Genovese.

    Science.gov (United States)

    Zunin, Paola; Leardi, Riccardo; Boggia, Raffaella

    2009-01-01

    Headspace sorptive extraction and GC/MS, coupled with chemometric tools, were used to predict the amounts of pine nuts and Pecorino in Pesto Genovese, a typical Italian basil-based pasta sauce. Two groups of samples were prepared at different times and with ingredients from different batches for building the predicting models and testing their performances. Principal component analysis and partial least-squares regression (PLS) were applied to the chromatographic data. The 24 most-predictive variables were selected, and the application of PLS to the training set samples led to two models that explained approximately 70% of the variance in cross-validation, with prediction errors of 0.1 g for Pecorino and 0.6 g for pine nuts, thus confirming the reliability of the analytical method and the predicting ability of the models. The results obtained for the test set samples were not completely satisfactory, with a prediction error and a bias of 5.0 and -4.1 g, respectively, for Pecorino and corresponding values of 4.1 and 2.0 g for pine nuts. This preliminary study shows that the analytical methods used can allow construction of models with high predictive ability only if the great variability of the headspace composition of the ingredients and the effect of Twister are considered.

  17. The Effects of Internal-External Locus of Control and Selected Demographic Variables on Rational-Irrational Beliefs.

    Science.gov (United States)

    Martin, Janice E.; And Others

    This study evaluated whether or not locus of control mediates rational-irrational beliefs. Data were generated investigating the impact of an internal-external orientation and selected demographic variables (age, race, gender, education, and occupation) on rational-irrational beliefs. Independent variables were locus of control and demographic…

  18. Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis

    Directory of Open Access Journals (Sweden)

    Crowcroft Natasha S

    2010-12-01

    Full Text Available Abstract Background Encephalitis is an acute clinical syndrome of the central nervous system (CNS, often associated with fatal outcome or permanent damage, including cognitive and behavioural impairment, affective disorders and epileptic seizures. Infection of the central nervous system is considered to be a major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. However, a large proportion of cases have unknown disease etiology. Methods We perform hierarchical cluster analysis on a multicenter England encephalitis data set with the aim of identifying sub-groups in human encephalitis. We use the simple matching similarity measure which is appropriate for binary data sets and performed variable selection using cluster heatmaps. We also use heatmaps to visually assess underlying patterns in the data, identify the main clinical and laboratory features and identify potential risk factors associated with encephalitis. Results Our results identified fever, personality and behavioural change, headache and lethargy as the main characteristics of encephalitis. Diagnostic variables such as brain scan and measurements from cerebrospinal fluids are also identified as main indicators of encephalitis. Our analysis revealed six major clusters in the England encephalitis data set. However, marked within-cluster heterogeneity is observed in some of the big clusters indicating possible sub-groups. Overall, the results show that patients are clustered according to symptom and diagnostic variables rather than causal agents. Exposure variables such as recent infection, sick person contact and animal contact have been identified as potential risk factors. Conclusions It is in general assumed and is a common practice to group encephalitis cases according to disease etiology. However, our results indicate that patients are clustered with respect to mainly symptom and diagnostic variables rather than causal agents

  19. The role of the c-statistic in variable selection for propensity score models.

    Science.gov (United States)

    Westreich, Daniel; Cole, Stephen R; Funk, Michele Jonsson; Brookhart, M Alan; Stürmer, Til

    2011-03-01

    The applied literature on propensity scores has often cited the c-statistic as a measure of the ability of the propensity score to control confounding. However, a high c-statistic in the propensity model is neither necessary nor sufficient for control of confounding. Moreover, use of the c-statistic as a guide in constructing propensity scores may result in less overlap in propensity scores between treated and untreated subjects; this may require the analyst to restrict populations for inference. Such restrictions may reduce precision of estimates and change the population to which the estimate applies. Variable selection based on prior subject matter knowledge, empirical observation, and sensitivity analysis is preferable and avoids many of these problems.

  20. Temporal variability of spawning site selection in the frog Rana dalmatina: consequences for habitat management

    Directory of Open Access Journals (Sweden)

    Ficetola, G. F.

    2006-12-01

    Full Text Available We evaluated whether R. dalmatina females laid their eggs randomly within a pond or preferred particular microhabitats. The same measures were performed in the same area in two consecutive years to determine whether the pattern remained constant over time. In 2003, we observed a significant selection for areas with more submerged deadwood and vegetation, presence of emergent ground and low water depth. However, these results were not confirmed in the subsequent year when none of the microhabitat features measured had a significant effect. Although microhabitat features can strongly influence tadpoles, the temporal variability of habitat at this spatial scale suggests that habitat management could be more effective if focused on a a wider spatial scale.

  1. Virtual noiseless amplification and Gaussian post-selection in continuous-variable quantum key distribution

    CERN Document Server

    Fiurasek, Jaromir

    2012-01-01

    The noiseless amplification or attenuation are two heralded filtering operations that enable respectively to increase or decrease the mean field of any quantum state of light with no added noise, at the cost of a small success probability. We show that inserting such noiseless operations in a transmission line improves the performance of continuous-variable quantum key distribution over this line. Remarkably, these noiseless operations do not need to be physically implemented but can simply be simulated in the data post-processing stage. Hence, virtual noiseless amplification or attenuation amounts to perform a Gaussian post-selection, which enhances the secure range or tolerable excess noise while keeping the benefits of Gaussian security proofs.

  2. Variable selection in the explorative analysis of several data blocks in metabolomics

    DEFF Research Database (Denmark)

    Karaman, İbrahim; Nørskov, Natalja; Yde, Christian Clement

    to be related. Tools for the handling of mental overflow minimising false discovery rates both by using statistical and biological validation in an integrative approach are needed. In this paper different strategies for variable selection were considered with respect to false discovery and the possibility...... for biological validation. The data set used in this study is metabolomics data from an animal intervention study. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using NMR and LC-MS based...... metabolomic approaches. Whole grain consumption has been shown to be protective against cardiovascular diseases, certain types of cancers, and type II diabetes. However, the food factors responsible for the preventive effects of whole grain and fibre-rich cereal fractions and the underlying mechanisms...

  3. THE IDENTIFICATION OF INFLATION RATE DETERMINANTS IN THE USA USING THE STOCHASTIC SEARCH VARIABLE SELECTION

    Directory of Open Access Journals (Sweden)

    Mihaela SIMIONESCU

    2016-03-01

    Full Text Available Inflation rate determinants for the USA have been analyzed in this study starting with 2008, when the American economy was already in crisis. This research brings, as a novelty, the use of Bayesian Econometrics methods to identify the monthly inflation rate in the USA. The Stochastic Search Variable Selection (SSVS has been applied for a subjective probability acceptance of 0.3. The results are validated also by economic theory. The monthly inflation rate was influenced during 2008-2015 by: the unemployment rate, the exchange rate, crude oil prices, the trade weighted U.S. Dollar Index and the M2 Money Stock. The study might be continued by considering other potential determinants of the inflation rate.

  4. Bayesian variable selection in searching for additive and dominant effects in genome-wide data.

    Directory of Open Access Journals (Sweden)

    Tomi Peltola

    Full Text Available Although complex diseases and traits are thought to have multifactorial genetic basis, the common methods in genome-wide association analyses test each variant for association independent of the others. This computational simplification may lead to reduced power to identify variants with small effect sizes and requires correcting for multiple hypothesis tests with complex relationships. However, advances in computational methods and increase in computational resources are enabling the computation of models that adhere more closely to the theory of multifactorial inheritance. Here, a Bayesian variable selection and model averaging approach is formulated for searching for additive and dominant genetic effects. The approach considers simultaneously all available variants for inclusion as predictors in a linear genotype-phenotype mapping and averages over the uncertainty in the variable selection. This leads to naturally interpretable summary quantities on the significances of the variants and their contribution to the genetic basis of the studied trait. We first characterize the behavior of the approach in simulations. The results indicate a gain in the causal variant identification performance when additive and dominant variation are simulated, with a negligible loss of power in purely additive case. An application to the analysis of high- and low-density lipoprotein cholesterol levels in a dataset of 3895 Finns is then presented, demonstrating the feasibility of the approach at the current scale of single-nucleotide polymorphism data. We describe a Markov chain Monte Carlo algorithm for the computation and give suggestions on the specification of prior parameters using commonly available prior information. An open-source software implementing the method is available at http://www.lce.hut.fi/research/mm/bmagwa/ and https://github.com/to-mi/.

  5. Quality Evaluation of Potentilla fruticosa L. by High Performance Liquid Chromatography Fingerprinting Associated with Chemometric Methods

    Science.gov (United States)

    Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue

    2016-01-01

    The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines. PMID:26890416

  6. Near Infrared Spectroscopy Calibration for Wood Chemistry: Which Chemometric Technique Is Best for Prediction and Interpretation?

    Science.gov (United States)

    Via, Brian K.; Zhou, Chengfeng; Acquah, Gifty; Jiang, Wei; Eckhardt, Lori

    2014-01-01

    This paper addresses the precision in factor loadings during partial least squares (PLS) and principal components regression (PCR) of wood chemistry content from near infrared reflectance (NIR) spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and after 1st derivative pretreatment, was utilized for model building and loadings investigation. As demonstrated by others, PLS was found to provide better predictive diagnostics. However, PCR exhibited a more precise estimate of loading peaks which makes PCR better for interpretation. Application of the 1st derivative appeared to assist in improving both PCR and PLS loading precision, but due to the small sample size, the two chemometric methods could not be compared statistically. This work is important because to date most research works have committed to PLS because it yields better predictive performance. But this research suggests there is a tradeoff between better prediction and model interpretation. Future work is needed to compare PLS and PCR for a suite of spectral pretreatment techniques. PMID:25068863

  7. Near Infrared Spectroscopy Calibration for Wood Chemistry: Which Chemometric Technique Is Best for Prediction and Interpretation?

    Directory of Open Access Journals (Sweden)

    Brian K. Via

    2014-07-01

    Full Text Available This paper addresses the precision in factor loadings during partial least squares (PLS and principal components regression (PCR of wood chemistry content from near infrared reflectance (NIR spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and after 1st derivative pretreatment, was utilized for model building and loadings investigation. As demonstrated by others, PLS was found to provide better predictive diagnostics. However, PCR exhibited a more precise estimate of loading peaks which makes PCR better for interpretation. Application of the 1st derivative appeared to assist in improving both PCR and PLS loading precision, but due to the small sample size, the two chemometric methods could not be compared statistically. This work is important because to date most research works have committed to PLS because it yields better predictive performance. But this research suggests there is a tradeoff between better prediction and model interpretation. Future work is needed to compare PLS and PCR for a suite of spectral pretreatment techniques.

  8. Quality Evaluation of Potentilla fruticosa L. by High Performance Liquid Chromatography Fingerprinting Associated with Chemometric Methods.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA of HPLC fingerprints, hierarchical cluster analysis (HCA, principle component analysis (PCA, and discriminant analysis (DA were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines.

  9. Strong Variability of Overlapping Iron Broad Absorption Lines in five Radio-selected Quasars

    CERN Document Server

    Zhang, Shaohua; Wang, Tinggui; Wang, Huiyuan; Shi, Xiheng; Liu, Bo; Liu, Wenjuan; Li, Zhenzhen; Wang, Shufen

    2015-01-01

    We present the results of a variability study of broad absorption lines (BALs) in a uniformly radio-selected sample of 28 BAL quasars using the archival data from the first bright quasar survey (FBQS) and the Sloan Digital Sky Survey (SDSS), as well as those obtained by ourselves, covering time scales $\\sim 1-10$ years in the quasar's rest-frame. The variable absorption troughs are detected in 12 BAL quasars. Among them, five cases showed strong spectral variations and are all belong to a special subclass of overlapping iron low ionization BALs (OFeLoBALs). The absorbers of \\ion{Fe}{2} are estimated to be formed by a relative dense (\\mbox{$n\\rm _{e} > 10^6~cm^{-3}$}) gas at a distance from the subparsec scale to the dozens of parsec-scale from the continuum source. They differ from those of invariable non-overlapping FeLoBALs (non-OFeLoBALs), which are the low-density gas and locate at the distance of hundreds to thousands parsecs. OFeLoBALs and non-OFeLoBALs, i.e., FeLoBALs with/without strong BAL variations...

  10. The impact of selected organizational variables and managerial leadership on radiation therapists' organizational commitment

    Energy Technology Data Exchange (ETDEWEB)

    Akroyd, Duane [Department of Adult and Community College Education, College of Education, Campus Box 7801, North Carolina State University, Raleigh, NC 27695 (United States)], E-mail: duane_akroyd@ncsu.edu; Legg, Jeff [Department of Radiologic Sciences, Virginia Commonwealth University, Richmond, VA 23284 (United States); Jackowski, Melissa B. [Division of Radiologic Sciences, University of North Carolina School of Medicine 27599 (United States); Adams, Robert D. [Department of Radiation Oncology, University of North Carolina School of Medicine 27599 (United States)

    2009-05-15

    The purpose of this study was to examine the impact of selected organizational factors and the leadership behavior of supervisors on radiation therapists' commitment to their organizations. The population for this study consists of all full time clinical radiation therapists registered by the American Registry of Radiologic Technologists (ARRT) in the United States. A random sample of 800 radiation therapists was obtained from the ARRT for this study. Questionnaires were mailed to all participants and measured organizational variables; managerial leadership variable and three components of organizational commitment (affective, continuance and normative). It was determined that organizational support, and leadership behavior of supervisors each had a significant and positive affect on normative and affective commitment of radiation therapists and each of the models predicted over 40% of the variance in radiation therapists organizational commitment. This study examined radiation therapists' commitment to their organizations and found that affective (emotional attachment to the organization) and normative (feelings of obligation to the organization) commitments were more important than continuance commitment (awareness of the costs of leaving the organization). This study can help radiation oncology administrators and physicians to understand the values their radiation therapy employees hold that are predictive of their commitment to the organization. A crucial result of the study is the importance of the perceived support of the organization and the leadership skills of managers/supervisors on radiation therapists' commitment to the organization.

  11. Combating unmeasured confounding in cross-sectional studies: evaluating instrumental-variable and Heckman selection models.

    Science.gov (United States)

    DeMaris, Alfred

    2014-09-01

    Unmeasured confounding is the principal threat to unbiased estimation of treatment "effects" (i.e., regression parameters for binary regressors) in nonexperimental research. It refers to unmeasured characteristics of individuals that lead them both to be in a particular "treatment" category and to register higher or lower values than others on a response variable. In this article, I introduce readers to 2 econometric techniques designed to control the problem, with a particular emphasis on the Heckman selection model (HSM). Both techniques can be used with only cross-sectional data. Using a Monte Carlo experiment, I compare the performance of instrumental-variable regression (IVR) and HSM to that of ordinary least squares (OLS) under conditions with treatment and unmeasured confounding both present and absent. I find HSM generally to outperform IVR with respect to mean-square-error of treatment estimates, as well as power for detecting either a treatment effect or unobserved confounding. However, both HSM and IVR require a large sample to be fully effective. The use of HSM and IVR in tandem with OLS to untangle unobserved confounding bias in cross-sectional data is further demonstrated with an empirical application. Using data from the 2006-2010 General Social Survey (National Opinion Research Center, 2014), I examine the association between being married and subjective well-being.

  12. Approaches to handle nonlinearities and nonnormalities in process chemometrics

    NARCIS (Netherlands)

    Thissen, Uwe Maria Johannes

    2004-01-01

    For every industrial process, it is of paramount interest to online monitor the performance of the process and to assess the quality of the products made. In order to meet these goals, the field of process control works on understanding and improving industrial processes. Process chemometrics can be

  13. Chemometric Optimization Studies in Catalysis Employing High-Throughput Experimentation

    NARCIS (Netherlands)

    Pereira, S.R.M.

    2008-01-01

    The main topic of this thesis is the investigation of the synergies between High-Throughput Experimentation (HTE) and Chemometric Optimization methodologies in Catalysis research and of the use of such methodologies to maximize the advantages of using HTE methods. Several case studies were analysed

  14. Principal Component Analysis: Most Favourite Tool in Chemometrics

    Indian Academy of Sciences (India)

    Keshav Kumar

    2017-08-01

    Principal component analysis (PCA) is the most commonlyused chemometric technique. It is an unsupervised patternrecognition technique. PCA has found applications in chemistry,biology, medicine and economics. The present work attemptsto understand how PCA work and how can we interpretits results.

  15. Selection of controlled variables in bioprocesses. Application to a SHARON-Anammox process for autotrophic nitrogen removal

    DEFF Research Database (Denmark)

    Mauricio Iglesias, Miguel; Valverde Perez, Borja; Sin, Gürkan

    Selecting the right controlled variables in a bioprocess is challenging since the objectives of the process (yields, product or substrate concentration) are difficult to relate with a given actuator. We apply here process control tools that can be used to assist in the selection of controlled...

  16. The Relationship between Organizational Climate and Selected Variables of Productivity-Reading Achievement, Teacher Experience and Teacher Attrition.

    Science.gov (United States)

    Smith, Stanley Jeffery

    This study investigated the relationship between organizational climate and selected organizational variables--reading achievement, teacher experience, and teacher attrition. The study sample consisted of the total teaching staffs and 642 randomly selected students from five elementary schools in a metropolitan school district. Data were collected…

  17. NUMBER OF TRIALS NECESSARY TO ACHIEVE PERFORMANCE STABILITY OF SELECTED GROUND REACTION FORCE VARIABLES DURING LANDING

    Directory of Open Access Journals (Sweden)

    C. Roger James

    2007-03-01

    Full Text Available The objectives were to determine the number of trials necessary to achieve performance stability of selected ground reaction force (GRF variables during landing and to compare two methods of determining stability. Ten subjects divided into two groups each completed a minimum of 20 drop or step-off landings from 0.60 or 0.61 m onto a force platform (1000 Hz. Five vertical GRF variables (first and second peaks, average loading rates to these peaks, and impulse were quantified during the initial 100 ms post-contact period. Test-retest reliability (stability was determined using two methods: (1 intra-class correlation coefficient (ICC analysis, and (2 sequential averaging analysis. Results of the ICC analysis indicated that an average of four trials (mean 3.8 ± 2.7 Group 1; 3.6 ± 1.7 Group 2 were necessary to achieve maximum ICC values. Maximum ICC values ranged from 0.55 to 0.99 and all were significantly (p < 0. 05 different from zero. Results of the sequential averaging analysis revealed that an average of 12 trials (mean 11.7 ± 3.1 Group 1; 11.5 ± 4.5 Group 2 were necessary to achieve performance stability using criteria previously reported in the literature. Using 10 reference trials, the sequential averaging technique required standard deviation criterion values of 0.60 and 0.49 for Groups 1 and 2, respectively, in order to approximate the ICC results. The results of the study suggest that the ICC might be a less conservative, but more objective method for determining stability, especially when compared to previous applications of the sequential averaging technique. Moreover, criteria for implementing the sequential averaging technique can be adjusted so that results closely approximate the results from ICC. In conclusion, subjects in landing experiments should perform a minimum of four and possibly as many as eight trials to achieve performance stability of selected GRF variables. Researchers should use this information to plan future

  18. VARIABILITY OF MICROBIAL AIR POLLUTION AND DUST CONCENTRATION INSIDE AND OUTSIDE A SELECTED SCHOOL IN POZNAŃ

    Directory of Open Access Journals (Sweden)

    Małgorzata Basińska

    2016-12-01

    Full Text Available The article presents an analysis of the variability of air parameters quality realized in the two cycles measured (03.2013 and 11.2014. The measurements were made during 1.5 years in selected educa-tional building from the 70s. Measurements in each cycle research were carried out in two classrooms, before lessons and directly after they are finished and outside the building. The research included an assessment of the physical air quality (the air temperature, relative humidity, CO2 concentration and microbiological contamination (the general count of mesophilic bacteria, the general count of psychrophilic bacteria, the count of staphylococcus (Staphylococcus mannitol positive (type α and mannitol negative (type β, the count of Pseudomonas fluorescens bacteria, actinobacteria (Actinobacteria, as well as the general count of microscopic fungi. Additionally, air samples were taken to determine the concentration of dust in the classroom before lessons and immediately after their end. The quality of the physical air correlated with the abundance and activity of students in classrooms. The measurement results of microbiological contaminations were compared with the Polish requirements (PN, in order to classify the degree of air pollution as a function of microorganisms in 1 m3 of air. On the basis of the measurements it was found that the analysed school physical air quality is unsatisfactory. Periodically, the acceptable levels of selected groups of microorganisms were exceeded. The measurement of dust concentrations showed that pupils’ activity inside the classrooms leads to secondary dust particles entrain.

  19. Predictive validity of variables used to select students for postgraduate management courses.

    Science.gov (United States)

    Lane, John; Lane, Andrew M

    2002-06-01

    The present study set in the United Kingdom examined the predictive validity of variables used to select graduate students into postgraduate management programs at a UK business school. 303 postgraduate students completed a cognitive ability test (MD5, Mental Ability Test), a questionnaire to assess perceptions of self-efficacy to succeed on the program, and reported their performance on their first (undergraduate) degree. Students completed these measures at the start of the programs. Each program comprised 12 modules, which all students were required to complete successfully. Students' performance was measured by the average grade obtained over the 12 modules. Multiple regression indicated that only 22% of the variance (Adjusted R2 = .22, p<.001) in students' performance was predicted significantly by cognitive ability scores. Results show that neither performance on first degree nor scores for self-efficacy showed a significant relationship to the criterion measure. Findings from the present study suggest that in the UK, the use of cognitive ability tests may play a significant role in the selection of students into postgraduate programs. Nonsignificant self-efficacy and performance relationships are ascribed to unclear knowledge of the demands of the program. We suggest that there is need for further research to examine factors related to performance.

  20. FCERI and Histamine Metabolism Gene Variability in Selective Responders to NSAIDS

    Science.gov (United States)

    Amo, Gemma; Cornejo-García, José A.; García-Menaya, Jesus M.; Cordobes, Concepcion; Torres, M. J.; Esguevillas, Gara; Mayorga, Cristobalina; Martinez, Carmen; Blanca-Lopez, Natalia; Canto, Gabriela; Ramos, Alfonso; Blanca, Miguel; Agúndez, José A. G.; García-Martín, Elena

    2016-01-01

    The high-affinity IgE receptor (Fcε RI) is a heterotetramer of three subunits: Fcε RIα, Fcε RIβ, and Fcε RIγ (αβγ2) encoded by three genes designated as FCER1A, FCER1B (MS4A2), and FCER1G, respectively. Recent evidence points to FCERI gene variability as a relevant factor in the risk of developing allergic diseases. Because Fcε RI plays a key role in the events downstream of the triggering factors in immunological response, we hypothesized that FCERI gene variants might be related with the risk of, or with the clinical response to, selective (IgE mediated) non-steroidal anti-inflammatory (NSAID) hypersensitivity. From a cohort of 314 patients suffering from selective hypersensitivity to metamizole, ibuprofen, diclofenac, paracetamol, acetylsalicylic acid (ASA), propifenazone, naproxen, ketoprofen, dexketoprofen, etofenamate, aceclofenac, etoricoxib, dexibuprofen, indomethacin, oxyphenylbutazone, or piroxicam, and 585 unrelated healthy controls that tolerated these NSAIDs, we analyzed the putative effects of the FCERI SNPs FCER1A rs2494262, rs2427837, and rs2251746; FCER1B rs1441586, rs569108, and rs512555; FCER1G rs11587213, rs2070901, and rs11421. Furthermore, in order to identify additional genetic markers which might be associated with the risk of developing selective NSAID hypersensitivity, or which may modify the putative association of FCERI gene variations with risk, we analyzed polymorphisms known to affect histamine synthesis or metabolism, such as rs17740607, rs2073440, rs1801105, rs2052129, rs10156191, rs1049742, and rs1049793 in the HDC, HNMT, and DAO genes. No major genetic associations with risk or with clinical presentation, and no gene-gene interactions, or gene-phenotype interactions (including age, gender, IgE concentration, antecedents of atopy, culprit drug, or clinical presentation) were identified in patients. However, logistic regression analyses indicated that the presence of antecedents of atopy and the DAO SNP rs2052129 (GG

  1. FCERI AND HISTAMINE METABOLISM GENE VARIABILITY IN SELECTIVE RESPONDERS TO NSAIDS

    Directory of Open Access Journals (Sweden)

    Gemma Amo

    2016-09-01

    Full Text Available The high-affinity IgE receptor (Fcε RI is a heterotetramer of three subunits: Fcε RIα, Fcε RIβ and Fcε RIγ (αβγ2 encoded by three genes designated as FCER1A, FCER1B (MS4A2 and FCER1G, respectively. Recent evidence points to FCERI gene variability as a relevant factor in the risk of developing allergic diseases. Because Fcε RI plays a key role in the events downstream of the triggering factors in immunological response, we hypothesized that FCERI gene variants might be related with the risk of, or with the clinical response to, selective (IgE mediated non-steroidal anti-inflammatory (NSAID hypersensitivity.From a cohort of 314 patients suffering from selective hypersensitivity to metamizole, ibuprofen, diclofenac, paracetamol, acetylsalicylic acid (ASA, propifenazone, naproxen, ketoprofen, dexketoprofen, etofenamate, aceclofenac, etoricoxib, dexibuprofen, indomethacin, oxyphenylbutazone or piroxicam, and 585 unrelated healthy controls that tolerated these NSAIDs, we analyzed the putative effects of the FCERI SNPs FCER1A rs2494262, rs2427837 and rs2251746; FCER1B rs1441586, rs569108 and rs512555; FCER1G rs11587213, rs2070901 and rs11421. Furthermore, in order to identify additional genetic markers which might be associated with the risk of developing selective NSAID hypersensitivity, or which may modify the putative association of FCERI gene variations with risk, we analyzed polymorphisms known to affect histamine synthesis or metabolism, such as rs17740607, rs2073440, rs1801105, rs2052129, rs10156191, rs1049742 and rs1049793 in the HDC, HNMT and DAO genes.No major genetic associations with risk or with clinical presentation, and no gene-gene interactions, or gene-phenotype interactions (including age, gender, IgE concentration, antecedents of atopy, culprit drug or clinical presentation were identified in patients. However, logistic regression analyses indicated that the presence of antecedents of atopy and the DAO SNP rs2052129 (GG

  2. Effects of Parceling on Model Selection: Parcel-Allocation Variability in Model Ranking.

    Science.gov (United States)

    Sterba, Sonya K; Rights, Jason D

    2016-01-25

    Research interest often lies in comparing structural model specifications implying different relationships among latent factors. In this context parceling is commonly accepted, assuming the item-level measurement structure is well known and, conservatively, assuming items are unidimensional in the population. Under these assumptions, researchers compare competing structural models, each specified using the same parcel-level measurement model. However, little is known about consequences of parceling for model selection in this context-including whether and when model ranking could vary across alternative item-to-parcel allocations within-sample. This article first provides a theoretical framework that predicts the occurrence of parcel-allocation variability (PAV) in model selection index values and its consequences for PAV in ranking of competing structural models. These predictions are then investigated via simulation. We show that conditions known to manifest PAV in absolute fit of a single model may or may not manifest PAV in model ranking. Thus, one cannot assume that low PAV in absolute fit implies a lack of PAV in ranking, and vice versa. PAV in ranking is shown to occur under a variety of conditions, including large samples. To provide an empirically supported strategy for selecting a model when PAV in ranking exists, we draw on relationships between structural model rankings in parcel- versus item-solutions. This strategy employs the across-allocation modal ranking. We developed software tools for implementing this strategy in practice, and illustrate them with an example. Even if a researcher has substantive reason to prefer one particular allocation, investigating PAV in ranking within-sample still provides an informative sensitivity analysis.

  3. Selection of AGN candidates in the GOODS-South Field through SPITZER/MIPS 24 microns variability

    CERN Document Server

    García-González, Judit; Pérez-González, Pablo G; Hernán-Caballero, Antonio; Sarajedini, Vicki L; Villar, Víctor

    2014-01-01

    We present a study of galaxies showing mid-infrared variability in the deepest Spitzer/MIPS 24 $\\mu$m surveys in the GOODS-South field. We divide the dataset in epochs and subepochs to study the long-term (months-years) and the short-term (days) variability. We use a $\\chi^2$-statistics method to select AGN candidates with a probability $\\leq$ 1% that the observed variability is due to statistical errors alone. We find 39 (1.7% of the parent sample) sources that show long-term variability and 55 (2.2% of the parent sample) showing short-term variability. We compare our candidates with AGN selected in the X-ray and radio bands, and AGN candidates selected by their IR emission. Approximately, 50% of the MIPS 24 $\\mu$m variable sources would be identified as AGN with these other methods. Therefore, MIPS 24 $\\mu$m variability is a new method to identify AGN candidates, possibly dust obscured and low luminosity AGN that might be missed by other methods. However, the contribution of the MIPS 24 $\\mu$m variable iden...

  4. Disruption of Brewers' yeast by hydrodynamic cavitation: Process variables and their influence on selective release.

    Science.gov (United States)

    Balasundaram, B; Harrison, S T L

    2006-06-01

    Intracellular products, not secreted from the microbial cell, are released by breaking the cell envelope consisting of cytoplasmic membrane and an outer cell wall. Hydrodynamic cavitation has been reported to cause microbial cell disruption. By manipulating the operating variables involved, a wide range of intensity of cavitation can be achieved resulting in a varying extent of disruption. The effect of the process variables including cavitation number, initial cell concentration of the suspension and the number of passes across the cavitation zone on the release of enzymes from various locations of the Brewers' yeast was studied. The release profile of the enzymes studied include alpha-glucosidase (periplasmic), invertase (cell wall bound), alcohol dehydrogenase (ADH; cytoplasmic) and glucose-6-phosphate dehydrogenase (G6PDH; cytoplasmic). An optimum cavitation number Cv of 0.13 for maximum disruption was observed across the range Cv 0.09-0.99. The optimum cell concentration was found to be 0.5% (w/v, wet wt) when varying over the range 0.1%-5%. The sustained effect of cavitation on the yeast cell wall when re-circulating the suspension across the cavitation zone was found to release the cell wall bound enzyme invertase (86%) to a greater extent than the enzymes from other locations of the cell (e.g. periplasmic alpha-glucosidase at 17%). Localised damage to the cell wall could be observed using transmission electron microscopy (TEM) of cells subjected to less intense cavitation conditions. Absence of the release of cytoplasmic enzymes to a significant extent, absence of micronisation as observed by TEM and presence of a lower number of proteins bands in the culture supernatant on SDS-PAGE analysis following hydrodynamic cavitation compared to disruption by high-pressure homogenisation confirmed the selective release offered by hydrodynamic cavitation.

  5. Selective quantification of the cardiac sympathetic and parasympathetic nervous systems by multisignal analysis of cardiorespiratory variability.

    Science.gov (United States)

    Chen, Xiaoxiao; Mukkamala, Ramakrishna

    2008-01-01

    Heart rate (HR) power spectral indexes are limited as measures of the cardiac autonomic nervous systems (CANS) in that they neither offer an effective marker of the beta-sympathetic nervous system (SNS) due to its overlap with the parasympathetic nervous system (PNS) in the low-frequency (LF) band nor afford specific measures of the CANS due to input contributions to HR [e.g., arterial blood pressure (ABP) and instantaneous lung volume (ILV)]. We derived new PNS and SNS indexes by multisignal analysis of cardiorespiratory variability. The basic idea was to identify the autonomically mediated transfer functions relating fluctuations in ILV to HR (ILV-->HR) and fluctuations in ABP to HR (ABP-->HR) so as to eliminate the input contributions to HR and then separate each estimated transfer function in the time domain into PNS and SNS indexes using physiological knowledge. We evaluated these indexes with respect to selective pharmacological autonomic nervous blockade in 14 humans. Our results showed that the PNS index derived from the ABP-->HR transfer function was correctly decreased after vagal and double (vagal + beta-sympathetic) blockade (P < 0.01) and did not change after beta-sympathetic blockade, whereas the SNS index derived from the same transfer function was correctly reduced after beta-sympathetic blockade in the standing posture and double blockade (P < 0.05) and remained the same after vagal blockade. However, this SNS index did not significantly decrease after beta-sympathetic blockade in the supine posture. Overall, these predictions were better than those provided by the traditional high-frequency (HF) power, LF-to-HF ratio, and normalized LF power of HR variability.

  6. Application of mass spectrometry based electronic nose and chemometrics for fingerprinting radiation treatment

    Science.gov (United States)

    Gupta, Sumit; Variyar, Prasad S.; Sharma, Arun

    2015-01-01

    Volatile compounds were isolated from apples and grapes employing solid phase micro extraction (SPME) and subsequently analyzed by GC/MS equipped with a transfer line without stationary phase. Single peak obtained was integrated to obtain total mass spectrum of the volatile fraction of samples. A data matrix having relative abundance of all mass-to-charge ratios was subjected to principal component analysis (PCA) and linear discriminant analysis (LDA) to identify radiation treatment. PCA results suggested that there is sufficient variability between control and irradiated samples to build classification models based on supervised techniques. LDA successfully aided in segregating control from irradiated samples at all doses (0.1, 0.25, 0.5, 1.0, 1.5, 2.0 kGy). SPME-MS with chemometrics was successfully demonstrated as simple screening method for radiation treatment.

  7. Prioritizing individual genetic variants after kernel machine testing using variable selection.

    Science.gov (United States)

    He, Qianchuan; Cai, Tianxi; Liu, Yang; Zhao, Ni; Harmon, Quaker E; Almli, Lynn M; Binder, Elisabeth B; Engel, Stephanie M; Ressler, Kerry J; Conneely, Karen N; Lin, Xihong; Wu, Michael C

    2016-12-01

    Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.

  8. Resiliency and subjective health assessment. Moderating role of selected psychosocial variables

    Directory of Open Access Journals (Sweden)

    Michalina Sołtys

    2015-12-01

    Full Text Available Background Resiliency is defined as a relatively permanent personality trait, which may be assigned to the category of health resources. The aim of this study was to determine conditions in which resiliency poses a significant health resource (moderation, thereby broadening knowledge of the specifics of the relationship between resiliency and subjective health assessment. Participants and procedure The study included 142 individuals. In order to examine the level of resiliency, the Assessment Resiliency Scale (SPP-25 by N. Ogińska-Bulik and Z. Juczyński was used. Participants evaluated subjective health state by means of an analogue-visual scale. Additionally, in the research the following moderating variables were controlled: sex, objective health status, having a partner, professional activity and age. These data were obtained by personal survey. Results The results confirmed the relationship between resiliency and subjective health assessment. Multiple regression analysis revealed that sex, having a partner and professional activity are significant moderators of associations between level of resiliency and subjective health evaluation. However, statistically significant interaction effects for health status and age as a moderator were not observed. Conclusions Resiliency is associated with subjective health assessment among adults, and selected socio-demographic features (such as sex, having a partner, professional activity moderate this relationship. This confirms the significant role of resiliency as a health resource and a reason to emphasize the benefits of enhancing the potential of individuals for their psychophysical wellbeing. However, the research requires replication in a more homogeneous sample.

  9. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, Kaiguang; Valle, Denis; Popescu, Sorin; Zhang, Xuesong; Malick, Bani

    2013-05-15

    Model specification remains challenging in spectroscopy of plant biochemistry, as exemplified by the availability of various spectral indices or band combinations for estimating the same biochemical. This lack of consensus in model choice across applications argues for a paradigm shift in hyperspectral methods to address model uncertainty and misspecification. We demonstrated one such method using Bayesian model averaging (BMA), which performs variable/band selection and quantifies the relative merits of many candidate models to synthesize a weighted average model with improved predictive performances. The utility of BMA was examined using a portfolio of 27 foliage spectral–chemical datasets representing over 80 species across the globe to estimate multiple biochemical properties, including nitrogen, hydrogen, carbon, cellulose, lignin, chlorophyll (a or b), carotenoid, polar and nonpolar extractives, leaf mass per area, and equivalent water thickness. We also compared BMA with partial least squares (PLS) and stepwise multiple regression (SMR). Results showed that all the biochemicals except carotenoid were accurately estimated from hyerspectral data with R2 values > 0.80.

  10. Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.

    Science.gov (United States)

    Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

    2016-01-01

    Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach.

  11. Relationship of Powder Feedstock Variability to Microstructure and Defects in Selective Laser Melted Alloy 718

    Science.gov (United States)

    Smith, T. M.; Kloesel, M. F.; Sudbrack, C. K.

    2017-01-01

    Powder-bed additive manufacturing processes use fine powders to build parts layer by layer. For selective laser melted (SLM) Alloy 718, the powders that are available off-the-shelf are in the 10-45 or 15-45 micron size range. A comprehensive investigation of sixteen powders from these typical ranges and two off-nominal-sized powders is underway to gain insight into the impact of feedstock on processing, durability and performance of 718 SLM space-flight hardware. This talk emphasizes an aspect of this work: the impact of powder variability on the microstructure and defects observed in the as-fabricated and full heated material, where lab-scale components were built using vendor recommended parameters. These typical powders exhibit variation in composition, percentage of fines, roughness, morphology and particle size distribution. How these differences relate to the melt-pool size, porosity, grain structure, precipitate distributions, and inclusion content will be presented and discussed in context of build quality and powder acceptance.

  12. Variable selection in monotone single-index models via the adaptive LASSO.

    Science.gov (United States)

    Foster, Jared C; Taylor, Jeremy M G; Nan, Bin

    2013-09-30

    We consider the problem of variable selection for monotone single-index models. A single-index model assumes that the expectation of the outcome is an unknown function of a linear combination of covariates. Assuming monotonicity of the unknown function is often reasonable and allows for more straightforward inference. We present an adaptive LASSO penalized least squares approach to estimating the index parameter and the unknown function in these models for continuous outcome. Monotone function estimates are achieved using the pooled adjacent violators algorithm, followed by kernel regression. In the iterative estimation process, a linear approximation to the unknown function is used, therefore reducing the situation to that of linear regression and allowing for the use of standard LASSO algorithms, such as coordinate descent. Results of a simulation study indicate that the proposed methods perform well under a variety of circumstances and that an assumption of monotonicity, when appropriate, noticeably improves performance. The proposed methods are applied to data from a randomized clinical trial for the treatment of a critical illness in the intensive care unit.

  13. Variable Selection for Functional Logistic Regression in fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Nedret BILLOR

    2015-03-01

    Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.

  14. Variation in Age and Training on Selected Biochemical Variables of Indian Hockey Players

    Directory of Open Access Journals (Sweden)

    I. Manna

    2010-04-01

    Full Text Available The present study was aimed to find out the variation of age and training on biochemical variables of Indian elite hockey players. A total of 120 hockey players who volunteered for the present study, were equally divided (n=30 into 4 groups: under 16 years (14-15 yrs; under 19 years (16-18 yrs; under 23 years (19-22 yrs; and senior (23-30 yrs. The training sessions were divided into 3 phases: Transition Phase (TP, Preparatory Phase (PP, and Competitive Phase (CP. The training programme consisted of aerobic, anaerobic and skill training; and completed 4 hours in morning and evening sessions, 5 days/week. Selected biochemical parameters were measured and data were analyzed by applying Two-way ANOVA and Post hoc test. The mean values of haemoglobin (Hb, total cholesterol (TC, triglyceride (TG, high density lipoprotein cholesterol (HDL-C and low density lipoprotein cholesterol (LDL-C have been increased significantly (P<0.05 with the advancement of age of players. A significant increase (P<0.05 in serum urea, uric acid and HDL-C and a significant decrease (P<0.05 in Hb, TC, TG and LDL-C have been noted in PP and CP when compared to that of TP. The present study would provide useful information for biochemical monitoring of training of hockey players.

  15. Uninformative variable elimination assisted by Gram-Schmidt Orthogonalization/successive projection algorithm for descriptor selection in QSAR

    DEFF Research Database (Denmark)

    Omidikia, Nematollah; Kompany-Zareh, Mohsen

    2013-01-01

    as collinearity reliability of the regression coefficient's magnitude is suspicious. Successive Projection Algorithm (SPA) and Gram-Schmidt Orthogonalization (GSO) were implemented as pre-selection technique for removing collinearity and redundancy among variables in the model. Uninformative variable elimination......-partial least squares (UVE-PLS) was performed on the pre-selected data set and C-value's were calculated for each descriptor. In this case the C-value's of LIVE assisted by SPA or GSO could be used in order to rank the variables according to their importance. Leave-many-out cross-validation (LMO-CV) was applied...... applying SPA-UVE-PLS on the anti-HIV data, nine descriptors were selected out of 160 with q(2) = 0.81, R-2 = 0.84 and Q(F3)(2) = 0.8. (C) 2013 Elsevier B.V. All rights reserved....

  16. An experiment on selecting most informative variables in socio-economic data

    Directory of Open Access Journals (Sweden)

    L. Jenkins

    2014-01-01

    Full Text Available In many studies where data are collected on several variables, there is a motivation to find if fewer variables would provide almost as much information. Variance of a variable about its mean is the common statistical measure of information content, and that is used here. We are interested whether the variability in one variable is sufficiently correlated with that in one or more of the other variables that the first variable is redundant. We wish to find one or more ‘principal variables’ that sufficiently reflect the information content in all the original variables. The paper explains the method of principal variables and reports experiments using the technique to see if just a few variables are sufficient to reflect the information in 11 socioeconomic variables on 130 countries from a World Bank (WB database. While the method of principal variables is highly successful in a statistical sense, the WB data varies greatly from year to year, demonstrating that fewer variables wo uld be inadequate for this data.

  17. Comparative performance of selected variability detection techniques in photometric time series data

    CERN Document Server

    Sokolovsky, K V; Karampelas, A; Antipin, S V; Bellas-Velidis, I; Benni, P; Bonanos, A Z; Burdanov, A Y; Derlopa, S; Hatzidimitriou, D; Khokhryakova, A D; Kolesnikova, D M; Korotkiy, S A; Lapukhin, E G; Moretti, M I; Popov, A A; Pouliasis, E; Samus, N N; Spetsieri, Z; Veselkov, S A; Volkov, K V; Yang, M; Zubareva, A M

    2016-01-01

    Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampli...

  18. Mid-infrared spectroscopy combined with chemometrics to detect Sclerotinia stem rot on oilseed rape (Brassica napus L.) leaves.

    Science.gov (United States)

    Zhang, Chu; Feng, Xuping; Wang, Jian; Liu, Fei; He, Yong; Zhou, Weijun

    2017-01-01

    Detection of plant diseases in a fast and simple way is crucial for timely disease control. Conventionally, plant diseases are accurately identified by DNA, RNA or serology based methods which are time consuming, complex and expensive. Mid-infrared spectroscopy is a promising technique that simplifies the detection procedure for the disease. Mid-infrared spectroscopy was used to identify the spectral differences between healthy and infected oilseed rape leaves. Two different sample sets from two experiments were used to explore and validate the feasibility of using mid-infrared spectroscopy in detecting Sclerotinia stem rot (SSR) on oilseed rape leaves. The average mid-infrared spectra showed differences between healthy and infected leaves, and the differences varied among different sample sets. Optimal wavenumbers for the 2 sample sets selected by the second derivative spectra were similar, indicating the efficacy of selecting optimal wavenumbers. Chemometric methods were further used to quantitatively detect the oilseed rape leaves infected by SSR, including the partial least squares-discriminant analysis, support vector machine and extreme learning machine. The discriminant models using the full spectra and the optimal wavenumbers of the 2 sample sets were effective for classification accuracies over 80%. The discriminant results for the 2 sample sets varied due to variations in the samples. The use of two sample sets proved and validated the feasibility of using mid-infrared spectroscopy and chemometric methods for detecting SSR on oilseed rape leaves. The similarities among the selected optimal wavenumbers in different sample sets made it feasible to simplify the models and build practical models. Mid-infrared spectroscopy is a reliable and promising technique for SSR control. This study helps in developing practical application of using mid-infrared spectroscopy combined with chemometrics to detect plant disease.

  19. Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure

    Directory of Open Access Journals (Sweden)

    Mabaso Musawenkosi LH

    2007-09-01

    Full Text Available Abstract Background Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. Results Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. Conclusion We have

  20. Selection of AGN candidates in the GOODS-South Field through SPITZER/MIPS 24 $\\mu$m variability

    CERN Document Server

    García-González, Judit; Pérez-González, Pablo G; Hernán-Caballero, Antonio; Sarajedini, Vicki L; Villar, Víctor

    2014-01-01

    We present a study of galaxies showing mid-infrared variability in data taken in the deepest Spitzer/MIPS 24 $\\mu$m surveys in the GOODS-South field. We divide the dataset in epochs and subepochs to study the long-term (months-years) and the short-term (days) variability. We use a $\\chi^2$-statistics method to select AGN candidates with a probability $\\leq$ 1% that the observed variability is due to statistical errors alone. We find 39 (1.7% of the parent sample) sources that show long-term variability and 55 (2.2% of the parent sample) showing short-term variability. That is, 0.03 sources $\\times$ arcmin$^{-2}$ for both, long-term and short-term variable sources. After removing the expected number of false positives inherent to the method, the estimated percentages are 1.0% and 1.4% of the parent sample for the long-term and short-term respectively. We compare our candidates with AGN selected in the X-ray and radio bands, and AGN candidates selected by their IR emission. Approximately, 50% of the MIPS 24 $\\m...

  1. Ultra-HPLC-MS(n) (Poly)phenolic profiling and chemometric analysis of juices from ancient Punica granatum L. Cultivars: a nontargeted approach.

    Science.gov (United States)

    Calani, Luca; Beghè, Deborah; Mena, Pedro; Del Rio, Daniele; Bruni, Renato; Fabbri, Andrea; Dall'asta, Chiara; Galaverna, Gianni

    2013-06-12

    This study deals with the qualitative characterization of the phenolic profile of pomegranate juices obtained from ancient accessions. Composition data, together with genetic, morphological, and agronomical parameters, may lead to a full characterization of such germplasm, with the aim of its retrieval and biodiversity valorization. Environmental adaptation, indeed, may contribute to an enrichment of the phenolic content in pomegranate, with important effects on its nutritional properties. More than 65 punicalagins, ellagic acid derivatives, flavonoids, anthocyanins, and phenylpropanoids were simultaneously detected from four centuries old Punica granatum L. ecotypes from northern Italy and compared with those of P. granatum cv. Dente di Cavallo, a widely cultivated Italian cultivar, using a simple ultra-HPLC (uHPLC) separation and MS(n) linear ion trap mass spectrometric characterization. Fingerprinting phytochemical discrimination of the accessions was obtained by chemometric analysis despite their limited geographical distribution, confirming the great intraspecific variability in pomegranate secondary metabolism. The combined recourse to uHPLC-MS(n) qualitative fingerprinting and multivariate analysis may represent a useful tool for the discrimination and selection of pomegranate germplasm with specific properties related to polyphenolic content.

  2. Near-infrared diffuse reflectance spectroscopy with sample spots and chemometrics for fast determination of bovine serum albumin in micro-volume samples

    Institute of Scientific and Technical Information of China (English)

    Cai-Jing Cui; Wen-Sheng Cai; Xue-Guang Shao

    2013-01-01

    Near-infrared diffuse reflectance spectroscopy (NIRDRS) has attracted more and more attention in analyzing the components in samples with complex matrices.However,to apply this technique to micro-analysis,there are still some obstacles to overcome such as the low sensitivity and spectral overlapping associated with this approach.A method for fast determination of bovine serum albumin (BSA) in micro-volume samples was studied using NIRDRS with sample spots and chemometric techniques.10 μL of sample spotted on a filter paper substrate was used for the spectral measurements.Quantitative analysis was obtained by partial least squares (PLS) regression with signal processing and variable selection.The results show that the correlation coefficient (R) between the predicted and the reference concentration is 0.9897 and the recoveries are in the range of 87.4%-114.4% for the validation samples in the concentration range of 0.61-8.10 mg/mL.These results suggest that the method has the potential to quickly measure proteins in micro-volume solutions.

  3. Comparative performance of selected variability detection techniques in photometric time series data

    Science.gov (United States)

    Sokolovsky, K. V.; Gavras, P.; Karampelas, A.; Antipin, S. V.; Bellas-Velidis, I.; Benni, P.; Bonanos, A. Z.; Burdanov, A. Y.; Derlopa, S.; Hatzidimitriou, D.; Khokhryakova, A. D.; Kolesnikova, D. M.; Korotkiy, S. A.; Lapukhin, E. G.; Moretti, M. I.; Popov, A. A.; Pouliasis, E.; Samus, N. N.; Spetsieri, Z.; Veselkov, S. A.; Volkov, K. V.; Yang, M.; Zubareva, A. M.

    2017-01-01

    Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time series data sets obtained with telescopes ranging in size from a telephoto lens to 1 m-class and probing variability on time-scales from minutes to decades. The test data sets together include light curves of 127 539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a time-scale and resistant to outliers. It can be complemented by the ratio of the light-curve variance to the mean square successive difference, 1/η, which is efficient in detecting variability on time-scales longer than the typical time interval between observations. Variable objects have larger 1/η and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.

  4. Comparative performance of selected variability detection techniques in photometric time series data

    Science.gov (United States)

    Sokolovsky, K. V.; Gavras, P.; Karampelas, A.; Antipin, S. V.; Bellas-Velidis, I.; Benni, P.; Bonanos, A. Z.; Burdanov, A. Y.; Derlopa, S.; Hatzidimitriou, D.; Khokhryakova, A. D.; Kolesnikova, D. M.; Korotkiy, S. A.; Lapukhin, E. G.; Moretti, M. I.; Popov, A. A.; Pouliasis, E.; Samus, N. N.; Spetsieri, Z.; Veselkov, S. A.; Volkov, K. V.; Yang, M.; Zubareva, A. M.

    2016-09-01

    Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time-series datasets obtained with telescopes ranging in size from a telephoto lens to 1 m-class and probing variability on timescales from minutes to decades. The test datasets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies, and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a timescale and resistant to outliers. It can be complemented by the ratio of the lightcurve variance to the mean square successive difference, 1/η, which is efficient in detecting variability on timescales longer than the typical time interval between observations. Variable objects have larger 1/η and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.

  5. Oracle Efficient Variable Selection in Random and Fixed Effects Panel Data Models

    DEFF Research Database (Denmark)

    Kock, Anders Bredahl

    , we prove that the Marginal Bridge estimator can asymptotically correctly distinguish between relevant and irrelevant explanatory variables. We do this without restricting the dependence between covariates and without assuming sub Gaussianity of the error terms thereby generalizing the results...... and irrelevant variables and the asymptotic distribution of the estimators of the coefficients of the relevant variables is the same as if only these had been included in the model, i.e. as if an oracle had revealed the true model prior to estimation. In the case of more explanatory variables than observations...... of Huang et al. (2008). Furthermore, the number of relevant variables is allowed to be larger than the sample size....

  6. Implementation of chemometrics in quality evaluation of food and beverages.

    Science.gov (United States)

    Efenberger-Szmechtyk, Magdalena; Nowak, Agnieszka; Kregiel, Dorota

    2017-01-27

    Conventional methods for food quality evaluation based on chemical or microbiological analysis followed by traditional univariate statistics such as ANOVA are considered insufficient for some purposes. More sophisticated instrumental methods including spectroscopy and chromatography, in combination with multivariate analysis - chemometrics, can be used to determine food authenticity, identify adulterations or mislabeling and determine food safety. The purpose of this review is to present the current state of knowledge on the use of chemometric tools for evaluating quality of food products of animal and plant origin and beverages. The article describes applications of several multivariate techniques in food and beverages research, showing their showing their role in adulteration detection, authentication, quality control, differentiation of samples and comparing their classification and prediction ability.

  7. Variability Selected Low-Luminosity Active Galactic Nuclei in the 4 Ms Chandra Deep Field-South

    Science.gov (United States)

    Young, M.; Brandt, W. N.; Xue, Y. Q.; Paolillo, D. M.; Alexander, F. E.; Bauer, F. E.; Lehmer, B. D.; Luo, B.; Shemmer, O.; Schneider, D. P.; Vignail, C.

    2012-01-01

    The 4 Ms Chandra Deep Field-South (CDF-S) and other deep X-ray surveys have been highly effective at selecting active galactic nuclei (AGN). However, cosmologically distant low-luminosity AGN (LLAGN) have remained a challenge to identify due to significant contribution from the host galaxy. We identify long-term X ray variability (approx. month years, observed frame) in 20 of 92 CDF-S galaxies spanning redshifts approx equals 00.8 - 1.02 that do not meet other AGN selection criteria. We show that the observed variability cannot be explained by X-ray binary populations or ultraluminous X-ray sources, so the variability is most likely caused by accretion onto a supermassive black hole. The variable galaxies are not heavily obscured in general, with a stacked effective power-law photon index of Gamma(sub Stack) approx equals 1.93 +/- 0.13, and arc therefore likely LLAGN. The LLAGN tend to lie it factor of approx equal 6-89 below the extrapolated linear variability-luminosity relation measured for luminous AGN. This may he explained by their lower accretion rates. Variability-independent black-hole mass and accretion-rate estimates for variable galaxies show that they sample a significantly different black hole mass-accretion-rate space, with masses a factor of 2.4 lower and accretion rates a factor of 22.5 lower than variable luminous AGNs at the same redshift. We find that an empirical model based on a universal broken power-law power spectral density function, where the break frequency depends on SMBH mass and accretion rate, roughly reproduces the shape, but not the normalization, of the variability-luminosity trends measured for variable galaxies and more luminous AGNs.

  8. Impact of Roasting on Identification of Hazelnut (Corylus avellana L.) Origin: A Chemometric Approach.

    Science.gov (United States)

    Locatelli, Monica; Coïsson, Jean Daniel; Travaglia, Fabiano; Bordiga, Matteo; Arlorio, Marco

    2015-08-19

    Hazelnuts belonging to different cultivars or cultivated in different geographic areas can be differentiated by their chemical profile; however, the roasting process may affect the composition of raw hazelnuts, thus compromising the possibility to identify their origin in processed foods. In this work, we characterized raw and roasted hazelnuts (Tonda Gentile Trilobata, TGT, from Italy and from Chile, Tonda di Giffoni from Italy, and Tombul from Turkey), as well as hazelnuts isolated from commercial products, with the aim to discriminate their cultivar and origin. The chemometric evaluation of selected chemical parameters (proximate composition, fatty acids, total polyphenols, antioxidant activity, and protein fingerprint by SDS-PAGE) permitted us to identify hazelnuts belonging to different cultivars and, concerning TGT samples, their different geographic origin. Also commercial samples containing Piedmontese TGT hazelnuts were correctly assigned to TGT cluster. In conclusion, even if the roasting process modifies the composition of roasted hazelnuts, this preliminary model study suggests that the identification of their origin is still possible.

  9. Determination and discrimination of biodiesel fuels by gas chromatographic and chemometric methods

    Science.gov (United States)

    Milina, R.; Mustafa, Z.; Bojilov, D.; Dagnon, S.; Moskovkina, M.

    2016-03-01

    Pattern recognition method (PRM) was applied to gas chromatographic (GC) data for a fatty acid methyl esters (FAME) composition of commercial and laboratory synthesized biodiesel fuels from vegetable oils including sunflower, rapeseed, corn and palm oils. Two GC quantitative methods to calculate individual fames were compared: Area % and internal standard. The both methods were applied for analysis of two certified reference materials. The statistical processing of the obtained results demonstrates the accuracy and precision of the two methods and allows them to be compared. For further chemometric investigations of biodiesel fuels by their FAME-profiles any of those methods can be used. PRM results of FAME profiles of samples from different vegetable oils show a successful recognition of biodiesels according to the feedstock. The information obtained can be used for selection of feedstock to produce biodiesels with certain properties, for assessing their interchangeability, for fuel spillage and remedial actions in the environment.

  10. Determination and discrimination of biodiesel fuels by gas chromatographic and chemometric methods

    Directory of Open Access Journals (Sweden)

    Milina R.

    2016-03-01

    Full Text Available Pattern recognition method (PRM was applied to gas chromatographic (GC data for a fatty acid methyl esters (FAME composition of commercial and laboratory synthesized biodiesel fuels from vegetable oils including sunflower, rapeseed, corn and palm oils. Two GC quantitative methods to calculate individual fames were compared: Area % and internal standard. The both methods were applied for analysis of two certified reference materials. The statistical processing of the obtained results demonstrates the accuracy and precision of the two methods and allows them to be compared. For further chemometric investigations of biodiesel fuels by their FAME-profiles any of those methods can be used. PRM results of FAME profiles of samples from different vegetable oils show a successful recognition of biodiesels according to the feedstock. The information obtained can be used for selection of feedstock to produce biodiesels with certain properties, for assessing their interchangeability, for fuel spillage and remedial actions in the environment.

  11. Chemical fingerprinting of Gardenia jasminoides Ellis by HPLC-DAD-ESI-MS combined with chemometrics methods.

    Science.gov (United States)

    Han, Yan; Wen, Jun; Zhou, Tingting; Fan, Guorong

    2015-12-01

    A fingerprint analysis method has been developed for characterization and discrimination of Gardenia jasminoides Ellis from different areas. The chemometrics methods including similarity evaluation, principal components analysis (PCA) and hierarchical clustering analysis (HCA) were introduced to identify more useful chemical markers for improving the quality control standard of dried ripe fruits of G. jasminoides Ellis. Then the selected chemical markers were analyzed by high performance liquid chromatography-diode array detection-electrospray ionization mass spectrometry (HPLC-DAD-ESI-MS) qualitatively and quantitatively. 23 characteristic peaks were assigned while 19 peaks of them were identified by comparing retention times, UV and MS spectra with authentic compounds or literature data. Moreover, 14 of them were determined quantitatively which could effectively evaluate the quality of G. jasminoides Ellis. This study was expected to provide comprehensive information for the quality evaluation of G. jasminoides Ellis, which would be a valuable reference for further study and development of this herb and related medicinal products.

  12. Variability of levels of PM, black carbon and particle number concentration in selected European cities

    Directory of Open Access Journals (Sweden)

    C. Reche

    2011-03-01

    Full Text Available In many large cities of Europe standard air quality limit values of particulate matter (PM are exceeded. Emissions from road traffic and biomass burning are frequently reported to be the major causes. As a consequence of these exceedances a large number of air quality plans, most of them focusing on traffic emissions reductions, have been implemented in the last decade. In spite of this implementation, a number of cities did not record a decrease of PM levels. Thus, is the efficiency of air quality plans overestimated? Or do we need a more specific metric to evaluate the impact of the above emissions on the levels of urban aerosols?

    This study shows the results of the interpretation of the 2009 variability of levels of PM, black carbon (BC, aerosol number concentration (N and a number of gaseous pollutants in seven selected urban areas covering road traffic, urban background, urban-industrial, and urban-shipping environments from southern, central and northern Europe.

    The results showed that variations of PM and N levels do not always reflect the variation of the impact of road traffic emissions on urban aerosols. However, BC levels vary proportionally with those of traffic related gaseous pollutants, such as CO, NO2 and NO. Due to this high correlation, one may suppose that monitoring the levels of these gaseous pollutants would be enough to extrapolate exposure to traffic-derived BC levels. However, the BC/CO, BC/NO2 and BC/NO ratios vary widely among the cities studied, as a function of distance to traffic emissions, vehicle fleet composition and the influence of other emission sources such as biomass burning. Thus, levels of BC should be measured at air quality monitoring sites.

    During traffic rush hours, a narrow variation in the N/BC ratio was evidenced, but a wide variation of this ratio was determined for the noon period. Although in central and northern Europe N and BC levels tend to vary

  13. Network-based group variable selection for detecting expression quantitative trait loci (eQTL

    Directory of Open Access Journals (Sweden)

    Zhang Xuegong

    2011-06-01

    Full Text Available Abstract Background Analysis of expression quantitative trait loci (eQTL aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD structure between loci in high-noise background. Results We propose a network-based group variable selection (NGVS method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs. Conclusions The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.

  14. Incorporating abundance information and guiding variable selection for climate-based ensemble forecasting of species' distributional shifts

    Science.gov (United States)

    2017-01-01

    Ecological niche models (ENMs) have increasingly been used to estimate the potential effects of climate change on species’ distributions worldwide. Recently, predictions of species abundance have also been obtained with such models, though knowledge about the climatic variables affecting species abundance is often lacking. To address this, we used a well-studied guild (temperate North American quail) and the Maxent modeling algorithm to compare model performance of three variable selection approaches: correlation/variable contribution (CVC), biological (i.e., variables known to affect species abundance), and random. We then applied the best approach to forecast potential distributions, under future climatic conditions, and analyze future potential distributions in light of available abundance data and presence-only occurrence data. To estimate species’ distributional shifts we generated ensemble forecasts using four global circulation models, four representative concentration pathways, and two time periods (2050 and 2070). Furthermore, we present distributional shifts where 75%, 90%, and 100% of our ensemble models agreed. The CVC variable selection approach outperformed our biological approach for four of the six species. Model projections indicated species-specific effects of climate change on future distributions of temperate North American quail. The Gambel’s quail (Callipepla gambelii) was the only species predicted to gain area in climatic suitability across all three scenarios of ensemble model agreement. Conversely, the scaled quail (Callipepla squamata) was the only species predicted to lose area in climatic suitability across all three scenarios of ensemble model agreement. Our models projected future loss of areas for the northern bobwhite (Colinus virginianus) and scaled quail in portions of their distributions which are currently areas of high abundance. Climatic variables that influence local abundance may not always scale up to influence

  15. Quasi-stellar Object Selection Algorithm Using Time Variability and Machine Learning: Selection of 1620 Quasi-stellar Object Candidates from MACHO Large Magellanic Cloud Database

    Science.gov (United States)

    Kim, Dae-Won; Protopapas, Pavlos; Byun, Yong-Ik; Alcock, Charles; Khardon, Roni; Trichas, Markos

    2011-07-01

    We present a new quasi-stellar object (QSO) selection algorithm using a Support Vector Machine, a supervised classification method, on a set of extracted time series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars, and microlensing events using 58 known QSOs, 1629 variable stars, and 4288 non-variables in the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false-positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) data set, which consists of 40 million light curves, and found 1620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false-positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.

  16. SELECTION OF VARIABLES FOR THE CROATIAN MUNICIPAL SOLID WASTE GENERATION MODEL

    Directory of Open Access Journals (Sweden)

    Anamarija Grbeš

    2017-01-01

    Full Text Available The MSW generation models are important elements of the waste management planning. This paper gives the findings of the second part of the research on Croatian MSW generation mechanism. The correlations of 17 variables are shown. The relationships between the variables are discussed. In the conclusion, independent variables to be hypothesised and tested in a model for the next part of the research are proposed.

  17. Examining the Moderating Effect of Disability Status on the Relationship between Trauma Symptomatology and Select Career Variables

    Science.gov (United States)

    Strauser, David R.; Lustig, Daniel C.; Uruk, Aye Ciftci

    2006-01-01

    In the current study, the authors examined whether the influence of trauma symptomatology on select career variables differs based on disability status. A total of 131 college students and 81 individuals with disabilities completed the "Career Thoughts Inventory," "My Vocational Situation," "Developmental Work Personality…

  18. The influence of the pressure force control signal on selected parameters of the vehicle continuously variable transmission

    Science.gov (United States)

    Bieniek, A.; Graba, M.; Prażnowski, K.

    2016-09-01

    The paper presents results of research on the effect of frequency control signal on the course selected operating parameters of the continuously variable transmission CVT. The study used a gear Fuji Hyper M6 with electro-hydraulic control system and proprietary software for control and data acquisition developed in LabView environment.

  19. PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data

    Directory of Open Access Journals (Sweden)

    Renata Bujak

    2016-07-01

    Full Text Available Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings

  20. Natural selection acts on Atlantic salmon major histocompatibility (MH) variability in the wild

    NARCIS (Netherlands)

    Eyto, de E.; McGinnity, P.; Consuegra, S.; Coughlan, J.; Tufto, J.; Farrell, K.; Megens, H.J.W.C.; Jordan, W.; Cross, T.; Stet, R.J.M.

    2007-01-01

    Pathogen-driven balancing selection is thought to maintain polymorphism in major histocompatibility (MH) genes. However, there have been few empirical demonstrations of selection acting on MH loci in natural populations. To determine whether natural selection on MH genes has fitness consequences for

  1. Multivariate Approaches for Simultaneous Determination of Avanafil and Dapoxetine by UV Chemometrics and HPLC-QbD in Binary Mixtures and Pharmaceutical Product.

    Science.gov (United States)

    2016-04-07

    Multivariate UV-spectrophotometric methods and Quality by Design (QbD) HPLC are described for concurrent estimation of avanafil (AV) and dapoxetine (DP) in the binary mixture and in the dosage form. Chemometric methods have been developed, including classical least-squares, principal component regression, partial least-squares, and multiway partial least-squares. Analytical figures of merit, such as sensitivity, selectivity, analytical sensitivity, LOD, and LOQ were determined. QbD consists of three steps, starting with the screening approach to determine the critical process parameter and response variables. This is followed by understanding of factors and levels, and lastly the application of a Box-Behnken design containing four critical factors that affect the method. From an Ishikawa diagram and a risk assessment tool, four main factors were selected for optimization. Design optimization, statistical calculation, and final-condition optimization of all the reactions were Carried out. Twenty-five experiments were done, and a quadratic model was used for all response variables. Desirability plot, surface plot, design space, and three-dimensional plots were calculated. In the optimized condition, HPLC separation was achieved on Phenomenex Gemini C18 column (250 × 4.6 mm, 5 μm) using acetonitrile-buffer (ammonium acetate buffer at pH 3.7 with acetic acid) as a mobile phase at flow rate of 0.7 mL/min. Quantification was done at 239 nm, and temperature was set at 20°C. The developed methods were validated and successfully applied for simultaneous determination of AV and DP in the dosage form.

  2. Predicting punching acceleration from selected strength and power variables in elite karate athletes: a multiple regression analysis.

    Science.gov (United States)

    Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson

    2014-07-01

    This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.

  3. Overview of the Usage of Chemometric Methods for Remediation Techniques of Radionuclides

    Science.gov (United States)

    Yilmaz, C.; Aslani, M. A. A.

    The aim of this study is to investigate the treatment of chemometric tools on remediation techniques for removal of Cs-137, Sr-90 and Ra-226 from environmental samples. In this study; statistical data are collected from literature about applications of chemometric methods.

  4. Characterization and Classification of Crude Oils Using a Combination of Spectroscopy and Chemometrics

    NARCIS (Netherlands)

    Peinder, Peter de

    2009-01-01

    Research has been carried out to the utility of chemometric models to predict long residue (LR) and short residue (SR) properties of a crude oil directly from its absorption or magnetic resonance spectrum. Such a combined spectroscopic-chemometric approach might offer a fast alternative for the elab

  5. Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

    Energy Technology Data Exchange (ETDEWEB)

    Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

    2007-10-24

    Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples

  6. Diffuse reflectance near infrared-chemometric methods development and validation of amoxicillin capsule formulations

    Directory of Open Access Journals (Sweden)

    Ahmed Nawaz Khan

    2016-01-01

    Full Text Available Objective: The aim of present study was to establish near infrared-chemometric methods that could be effectively used for quality profiling through identification and quantification of amoxicillin (AMOX in formulated capsule which were similar to commercial products. In order to evaluate a large number of market products easily and quickly, these methods were modeled. Materials and Methods: Thermo Scientific Antaris II near infrared analyzer with TQ Analyst Chemometric Software were used for the development and validation of the identification and quantification models. Several AMOX formulations were composed with four excipients microcrystalline cellulose, magnesium stearate, croscarmellose sodium and colloidal silicon dioxide. Development includes quadratic mixture formulation design, near infrared spectrum acquisition, spectral pretreatment and outlier detection. According to prescribed guidelines by International Conference on Harmonization (ICH and European Medicine Agency (EMA developed methods were validated in terms of specificity, accuracy, precision, linearity, and robustness. Results: On diffuse reflectance mode, an identification model based on discriminant analysis was successfully processed with 76 formulations; and same samples were also used for quantitative analysis using partial least square algorithm with four latent variables and 0.9937 correlation of coefficient followed by 2.17% root mean square error of calibration (RMSEC, 2.38% root mean square error of prediction (RMSEP, 2.43% root mean square error of cross-validation (RMSECV. Conclusion: Proposed model established a good relationship between the spectral information and AMOX identity as well as content. Resulted values show the performance of the proposed models which offers alternate choice for AMOX capsule evaluation, relative to that of well-established high-performance liquid chromatography method. Ultimately three commercial products were successfully evaluated

  7. Culture, Organizational Learning and Selected Employee Background Variables in Small-Size Business Enterprises

    Science.gov (United States)

    Graham, Carroll M.; Nafukho, Fredrick Muyia

    2007-01-01

    Purpose: The purpose of this study is to determine the relationship between four independent variables educational level, longevity, type of enterprise, and gender and the dependent variable culture, as a dimension that explains organizational learning readiness in seven small-size business enterprises. Design/methodology/approach: An exploratory…

  8. Applied Music Teaching Behavior as a Function of Selected Personality Variables.

    Science.gov (United States)

    Schmidt, Charles P.

    1989-01-01

    Investigates the relationships among applied music teaching behaviors and personality variables as measured by the Myers-Briggs Type Indicator (MBTI). Suggests that personality variables may be important factors underlying four applied music teaching behaviors: approvals, rate of reinforcement, teacher model/performance, and pace. (LS)

  9. Performance of PLS regression coefficients in selecting variables for each response of a multivariate PLS for omics-type data

    Directory of Open Access Journals (Sweden)

    Giuseppe Palermo

    2009-05-01

    Full Text Available Giuseppe Palermo1, Paolo Piraino2, Hans-Dieter Zucht31Digilab BioVision GmbH, Hannover, Germany; 2Dr Paolo Piraino Statistical Consulting, Rende (CS, Italy; 3Proteome Sciences R&D GmbH and C. KG, Frankfurt am Main, GermanyAbstract: Multivariate partial least square (PLS regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics. In presence of multiple responses, it is of particular interest how to appropriately “dissect” the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection. In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coeffi cients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.Keywords: partial least square regression, regression coefficients, variable selection, biomarker discovery, omics-data

  10. The role of protozoa-driven selection in shaping human genetic variability.

    Science.gov (United States)

    Pozzoli, Uberto; Fumagalli, Matteo; Cagliani, Rachele; Comi, Giacomo P; Bresolin, Nereo; Clerici, Mario; Sironi, Manuela

    2010-03-01

    Protozoa exert a strong selective pressure in humans. The selection signatures left by these pathogens can be exploited to identify genetic modulators of infection susceptibility. We show that protozoa diversity in different geographic locations is a good measure of protozoa-driven selective pressure; protozoa diversity captured selection signatures at known malaria resistance loci and identified several selected single nucleotide polymorphisms in immune and hemolytic anemia genes. A genome-wide search enabled us to identify 5180 variants mapping to 1145 genes that are subjected to protozoa-driven selective pressure. We provide a genome-wide estimate of protozoa-driven selective pressure and identify candidate susceptibility genes for protozoa-borne diseases. Copyright 2010 Elsevier Ltd. All rights reserved.

  11. Variable selection for modeling the absolute magnitude at maximum of Type Ia supernovae

    Science.gov (United States)

    Uemura, Makoto; Kawabata, Koji S.; Ikeda, Shiro; Maeda, Keiichi

    2015-06-01

    We discuss what is an appropriate set of explanatory variables in order to predict the absolute magnitude at the maximum of Type Ia supernovae. In order to have a good prediction, the error for future data, which is called the "generalization error," should be small. We use cross-validation in order to control the generalization error and a LASSO-type estimator in order to choose the set of variables. This approach can be used even in the case that the number of samples is smaller than the number of candidate variables. We studied the Berkeley supernova database with our approach. Candidates for the explanatory variables include normalized spectral data, variables about lines, and previously proposed flux ratios, as well as the color and light-curve widths. As a result, we confirmed the past understanding about Type Ia supernovae: (i) The absolute magnitude at maximum depends on the color and light-curve width. (ii) The light-curve width depends on the strength of Si II. Recent studies have suggested adding more variables in order to explain the absolute magnitude. However, our analysis does not support adding any other variables in order to have a better generalization error.

  12. Phenotypic variability and selection of lipid-producing microalgae in a microfluidic centrifuge

    Science.gov (United States)

    Estévez-Torres, André.; Mestler, Troy; Austin, Robert H.

    2010-03-01

    Isogenic cells are known to display various expression levels that may result in different phenotypes within a population. Here we focus on the phenotypic variability of a species of unicellular algae that produce neutral lipids. Lipid-producing algae are one of the most promising sources of biofuel. We have implemented a simple microfluidic method to assess lipid-production variability in a population of algae that relays on density differences. We will discuss the reasons of this variability and address the promising avenues of this technique for directing the evolution of algae towards high lipid productivity.

  13. [Determination of cotton content in cotton/ramie blended fabric by NIR spectra and variable selection methods].

    Science.gov (United States)

    Sun, Tong; Geng, Xiang; Liu, Mu-hua

    2014-12-01

    Rapid detection of textile fiber components is very important for production process of quality control, trading and market surveillance. The objective of this research was to assess cotton content in cotton/ramie blended fabric quickly by near infrared (NIR) spectrum technology and variable selection methods. Reflectance spectra of samples were acquired by a NIRFlex N-500 Fourier spectroscopy in the range of 4000~10,000 cm(-1), primary election of spectral range and pretreatment analysis were conducted first. Then, three variable selection methods such as UVE (uninformative variables elimination), SPA (successive projections algorithm) and CARS (competitive adaptive reweighted sampling) were used to select sensitive variables. After that, PLS (partial least squares) was used to develop calibration model for cotton content of cotton/ramie blended fabric, and the best calibration model was used to predict cotton content of samples in prediction set. The result indicates that range of 4052~8000 cm(-1) is optimal spectral range for cotton content modeling. CARS method is an efficient method to improve model performance, the correlation coefficient and root mean square error of CARS-PLS for calibration and prediction sets are 0.903, 0.749 and 8.01%, 12.93%, respectively. So NIR spectra combined with CARS method is feasible for assessing cotton content in cotton/ramie blended fabric, and CARS method can simplify model, improve model performance.

  14. Natural selection acts on Atlantic salmon major histocompatibility (MH) variability in the wild

    OpenAIRE

    De Eyto, E.; McGinnity, P.; Consuegra, S.; Coughlan, J.; TUFTO, J; Farrell, K; Megens, H J; Jordan, W.; Cross, T.; Stet, R. J. M.

    2007-01-01

    Pathogen-driven balancing selection is thought to maintain polymorphism in major histocompatibility (MH) genes. However, there have been few empirical demonstrations of selection acting on MH loci in natural populations. To determine whether natural selection on MH genes has fitness consequences for wild Atlantic salmon in natural conditions, we compared observed genotype frequencies of Atlantic salmon (Salmo salar) surviving in a river six months after their introduction as eggs with frequen...

  15. Variable selection under multiple imputation using the bootstrap in a prognostic study

    NARCIS (Netherlands)

    Heymans, M.W.; Buuren, S. van; Knol, D.L.; Mechelen, W. van; Vet, H.C.W. de

    2007-01-01

    Background. Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable s

  16. Variable selection under multiple imputation using the bootstrap in a prognostic study

    NARCIS (Netherlands)

    Heymans, M.W.; Buuren, S. van; Knol, D.L.; Mechelen, W. van; Vet, H.C.W. de

    2007-01-01

    Background. Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable s

  17. The effects of selective breeding against scrapie susceptibility on the genetic variability of the Latxa Black-Faced sheep breed

    Directory of Open Access Journals (Sweden)

    Legarra Andrés

    2006-09-01

    Full Text Available Abstract Breeding sheep populations for scrapie resistance could result in a loss of genetic variability. In this study, the effect on genetic variability of selection for increasing the ARR allele frequency was estimated in the Latxa breed. Two sources of information were used, pedigree and genetic polymorphisms (fifteen microsatellites. The results based on the genealogical information were conditioned by a low pedigree completeness level that revealed the interest of also using the information provided by the molecular markers. The overall results suggest that no great negative effect on genetic variability can be expected in the short time in the population analysed by selection of only ARR/ARR males. The estimated average relationship of ARR/ARR males with reproductive females was similar to that of all available males whatever its genotype: 0.010 vs. 0.012 for a genealogical relationship and 0.257 vs. 0.296 for molecular coancestry, respectively. However, selection of only ARR/ARR males implied important losses in founder animals (87 percent and low frequency alleles (30 percent in the ram population. The evaluation of mild selection strategies against scrapie susceptibility based on the use of some ARR heterozygous males was difficult because the genetic relationships estimated among animals differed when pedigree or molecular information was used, and the use of more molecular markers should be evaluated.

  18. Realisations of the Word-initial Variable (th) in Selected Late Middle English Northern Legal Documents

    OpenAIRE

    Adamczyk, Michał

    2016-01-01

    Synchronic variability in the area of phonetics, phonology, vocabulary, morphology and syntax is a natural feature of any language, including English. The existence of competing variants is in itself a fascinating phenomenon, but it is also a prerequisite for diachronic changes. This volume is a collection of studies which investigate variability from a contemporary and historical perspective, in both native and non-native varieties of English. The topics include Middle English spelling varia...

  19. Stabilizing Gain Selection of Networked Variable Gain Controller to Maximize Robustness Using Particle Swarm Optimization

    CERN Document Server

    Pan, Indranil; Ghosh, Soumyajit; Gupta, Amitava; 10.1109/PACC.2011.5978958

    2012-01-01

    Networked Control Systems (NCSs) are often associated with problems like random data losses which might lead to system instability. This paper proposes a method based on the use of variable controller gains to achieve maximum parametric robustness of the plant controlled over a network. Stability using variable controller gains under data loss conditions is analyzed using a suitable Linear Matrix Inequality (LMI) formulation. Also, a Particle Swarm Optimization (PSO) based technique is used to maximize parametric robustness of the plant.

  20. VARSEL: Variable Selection for Multiple-Purpose Prediction Systems in the Absence of External Criteria.

    Science.gov (United States)

    Gould, R. Bruce; Christal, Raymond E.

    The absence of suitable external criteria is a recurrent problem for test, battery, and inventory developers in selecting items or tests for inclusion in final operational instruments. This report presents a computing algorithm developed for use when no adequate external selection criterion is available. The algorithm uses a multiple linear…

  1. Shape, sizing optimization and material selection based on mixed variables and genetic algorithm

    NARCIS (Netherlands)

    Tang, X.; Bassir, D.H.; Zhang, W.

    2010-01-01

    In this work, we explore simultaneous designs of materials selection and structural optimization. As the material selection turns out to be a discrete process that finds the optimal distribution of materials over the design domain, it cannot be performed with common gradient-based optimization

  2. Reader variability in QT measurement due to measurement error and variability in leads selection: a simulation study comparing 2-way vs. 3-way interaction ANOVA model.

    Science.gov (United States)

    Natekar, Mili; Karnad, Dilip R; Salvi, Vaibhav; Ramasamy, Arumugam; Kerkar, Vaibhav; Panicker, Gopi Krishna; Kothari, Snehal

    2014-01-01

    Reader variability (RV) results from measurement differences or variability in lead used for QT measurements; the latter is not reflected in conventional methods for estimating RV. Mean and SD of QT intervals in 12 leads of 100 ECGs measured twice were used to simulate data sets with inter-RV of 5, 10, 15, 20, and 25 ms and intra-RV of 3, 6, 9, 12, and 15 ms. Six hundred twenty-five data sets were simulated such that different leads were used in Read1 and Read2 in 0, 10%, 20%, 30%, 40% of ECGs by 25 readers. RV was estimated using ANOVA interaction models: three-way model using Reader, ECG and lead as factors, and 2-way model using reader and ECG as factors. Estimates from three-way model accurately matched inter- and intra-RV that were introduced during simulation regardless of percent of ECGs with lead selection variability. The two-way model provides identical estimates when both reads are in same leads, but higher, more realistically estimates when measurements are made in different leads. © 2013.

  3. Variable Selection for Modeling the Absolute Magnitude at Maximum of Type Ia Supernovae

    CERN Document Server

    Uemura, Makoto; Kawabata, S; Ikeda, Shiro; Maeda, Keiichi

    2015-01-01

    We discuss what is an appropriate set of explanatory variables in order to predict the absolute magnitude at the maximum of Type Ia supernovae. In order to have a good prediction, the error for future data, which is called the "generalization error," should be small. We use cross-validation in order to control the generalization error and LASSO-type estimator in order to choose the set of variables. This approach can be used even in the case that the number of samples is smaller than the number of candidate variables. We studied the Berkeley supernova database with our approach. Candidates of the explanatory variables include normalized spectral data, variables about lines, and previously proposed flux-ratios, as well as the color and light-curve widths. As a result, we confirmed the past understanding about Type Ia supernova: i) The absolute magnitude at maximum depends on the color and light-curve width. ii) The light-curve width depends on the strength of Si II. Recent studies have suggested to add more va...

  4. Latent Variable Graphical Model Selection using Harmonic Analysis: Applications to the Human Connectome Project (HCP).

    Science.gov (United States)

    Kim, Won Hwa; Kim, Hyunwoo J; Adluru, Nagesh; Singh, Vikas

    2016-06-01

    A major goal of imaging studies such as the (ongoing) Human Connectome Project (HCP) is to characterize the structural network map of the human brain and identify its associations with covariates such as genotype, risk factors, and so on that correspond to an individual. But the set of image derived measures and the set of covariates are both large, so we must first estimate a 'parsimonious' set of relations between the measurements. For instance, a Gaussian graphical model will show conditional independences between the random variables, which can then be used to setup specific downstream analyses. But most such data involve a large list of 'latent' variables that remain unobserved, yet affect the 'observed' variables sustantially. Accounting for such latent variables is not directly addressed by standard precision matrix estimation, and is tackled via highly specialized optimization methods. This paper offers a unique harmonic analysis view of this problem. By casting the estimation of the precision matrix in terms of a composition of low-frequency latent variables and high-frequency sparse terms, we show how the problem can be formulated using a new wavelet-type expansion in non-Euclidean spaces. Our formulation poses the estimation problem in the frequency space and shows how it can be solved by a simple sub-gradient scheme. We provide a set of scientific results on ~500 scans from the recently released HCP data where our algorithm recovers highly interpretable and sparse conditional dependencies between brain connectivity pathways and well-known covariates.

  5. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  6. A novel variability-based method for quasar selection: evidence for a rest frame ~54 day characteristic timescale

    CERN Document Server

    Graham, Matthew J; Drake, Andrew J; Mahabal, Ashish A; Chang, Melissa; Stern, Daniel; Donalek, Ciro; Glikman, Eilat

    2014-01-01

    We compare quasar selection techniques based on their optical variability using data from the Catalina Real-time Transient Survey (CRTS). We introduce a new technique based on Slepian wavelet variance (SWV) that shows comparable or better performance to structure functions and damped random walk models but with fewer assumptions. Combining these methods with WISE mid-IR colors produces a highly efficient quasar selection technique which we have validated spectroscopically. The SWV technique also identifies characteristic timescales in a time series and we find a characteristic rest frame timescale of ~54 days, confirmed in the light curves of ~18000 quasars from CRTS, SDSS and MACHO data, and anticorrelated with absolute magnitude. This indicates a transition between a damped random walk and $P(f) \\propto f^{-1/3}$ behaviours and is the first strong indication that a damped random walk model may be too simplistic to describe optical quasar variability.

  7. A stochastic analysis of terrain evaluation variables for path selection. [roving vehicle navigation

    Science.gov (United States)

    Donohue, J. G.; Shen, C. N.

    1978-01-01

    A stochastic analysis was performed on the variables associated with the characteristics of the terrain encountered by a roving system with an autonomous navigation system. A laser rangefinder is employed to detect terrain features at ranges up to 75 m. Analytic expressions and a numerical scheme were developed to calculate the variance of data on these four variables: (1) body clearance, (2) in-path slope, (3) tilt slope, and (4) wheel deviation. The variance is due to noise in the range data. It was found that the standard deviation of these terrain variables is large enough to warrant the use of a safety margin to aid the roving vehicle in avoiding high risk areas.

  8. High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust

    Directory of Open Access Journals (Sweden)

    Vahid Partovi Nia

    2012-04-01

    Full Text Available The R package bclust is useful for clustering high-dimensional continuous data. The package uses a parametric spike-and-slab Bayesian model to downweight the effect of noise variables and to quantify the importance of each variable in agglomerative clustering. We take advantage of the existence of closed-form marginal distributions to estimate the model hyper-parameters using empirical Bayes, thereby yielding a fully automatic method. We discuss computational problems arising in implementation of the procedure and illustrate the usefulness of the package through examples.

  9. The selection of a mode of urban transportation: integrating psychological variables to discrete choice models

    OpenAIRE

    CÓRDOBA MAQUILÓN, JORGE E.; GONZÁLEZ-CALDERÓN, CARLOS A.; JOHN J. POSADA HENAO

    2012-01-01

    Aplicando encuestas de preferencias reveladas y cuestionarios psicológicos se realizó un estudio detectando variables psicológicas claves de la conducta que intervienen en la elección de un modo de transporte en un grupo de habitantes del Área Metropolitana del Valle de Aburrá. Se tuvo en cuenta la teoría de la utilidad aleatoriapara los modelos de elección discreta y la acción razonada para evaluar las creencias y se utilice como herramienta de análisis de las variables psicológicas el cuest...

  10. Paper spray mass spectrometry and PLS-DA improved by variable selection for the forensic discrimination of beers.

    Science.gov (United States)

    Pereira, Hebert Vinicius; Amador, Victória Silva; Sena, Marcelo Martins; Augusti, Rodinei; Piccin, Evandro

    2016-10-12

    Paper spray mass spectrometry (PS-MS) combined with partial least squares discriminant analysis (PLS-DA) was applied for the first time in a forensic context to a fast and effective differentiation of beers. Eight different brands of American standard lager beers produced by four different breweries (141 samples from 55 batches) were studied with the aim at performing a differentiation according to their market prices. The three leader brands in the Brazilian beer market, which have been subject to fraud, were modeled as the higher-price class, while the five brands most used for counterfeiting were modeled as the lower-price class. Parameters affecting the paper spray ionization were examined and optimized. The best MS signal stability and intensity was obtained while using the positive ion mode, with PS(+) mass spectra characterized by intense pairs of signals corresponding to sodium and potassium adducts of malto-oligosaccharides. Discrimination was not apparent neither by using visual inspection nor principal component analysis (PCA). However, supervised classification models provided high rates of sensitivity and specificity. A PLS-DA model using full scan mass spectra were improved by variable selection with ordered predictors selection (OPS), providing 100% of reliability rate and reducing the number of variables from 1701 to 60. This model was interpreted by detecting fifteen variables as the most significant VIP (variable importance in projection) scores, which were therefore considered diagnostic ions for this type of beer counterfeit.

  11. Correlations of Back Strength with Selected Anthropometric Variables and Performance Tests in Indian Inter-University Male Field Hockey Players

    Directory of Open Access Journals (Sweden)

    S. Koley

    2017-01-01

    Full Text Available : The purpose of this study was of two-fold: first, to estimate the back strength of Indian inter-university male field hockey players and, second, to search the correlations of it with selected anthropometric variables and performance tests. To serve this purpose, a total of nine anthropometric variables, such as height, weight, body mass index, percent body fat, knee height, length of femur, femur biepicondylar diameter, skeletal mass and back strength, and two performance tests, such as sit and reach test and Slalom sprint and dribble test were measured on purposely selected 120 Indian inter-university male hockey players aged 18–25 years collected from the inter-university competition held in Guru Nanak Dev University, Amritsar, India during March, 2014. An adequate number of controls (n=119 were also taken from the same place for comparison. The results showed that the hockey players had the higher mean values in all the variables, except percent body fat and slalom sprint and dribble test than their control counterparts, showing statistically significant differences (p ≤ 0.003 – 0.001 between them. No significant correlations of back strength were found with any of the variables in Indian inter-university male field hockey players. In conclusion, it may be stated that back strength may not be used as one of the indicating factors for the performance of the field hockey players.

  12. Prediction of class membership of biodiesels using chemometrics.

    Science.gov (United States)

    Mustafa, Zylia; Milina, Rumyana; Simeonova, Pavlina A; Tsakovski, Stefan L; Simeonov, Vasil D

    2015-01-01

    Recently, serious scientific and technological attention is paid to creation of alternative energy sources, including biofuels. The assessment of the quality of the biofuels produced and of the raw materials needed for the production technology is an important scientific challenge. One of the major sources for biodiesel production is plant oils material (sunflower, rapeseed, palm, soya etc.). Since plants are complex system from the biota it is not easy to find specific chemical components responsible for their ability to serve as biodiesels. The characterization and classification of plant sources as biofuel material could be reliably estimated only by the use of multivariate statistical approaches (chemometrics). The chemometric expertise makes it possible not only to classify different biofuel sources into similarity classes but also to predict the membership of unknown by origin chemically analyzed samples to already existing classes. The present study deals with the prediction of the class membership of several unknown by origin samples, which are included in a large data set with FAME profiles of biodiesel plant sources. Using a data set from chromatographic analysis of fatty acid methyl esters profiles (FAME) of different plant biodiesel sources and applying the chemometric technique know as partial least squares-discriminant analysis (PLS - DA) a pattern recognition procedure is developed to: I. Model classes of similarity of biodiesel plant sources using their FAME profiles not taking into account the samples with unknown origin; II. Classify correctly the samples with unknown origin to the previously defined classes of biodiesel sources (palm oil, soybean oil, peanut oil, rapeseed oil, sunflower oil and maize oil). The prediction is successfully achieved for all samples with previously unknown origin. This pattern recognition approach is applied for the first time in the field of biodiesel classification and modeling tasks.

  13. Effectiveness of Shrinkage and Variable Selection Methods for the Prediction of Complex Human Traits using Data from Distantly Related Individuals

    Science.gov (United States)

    Pérez‐Rodríguez, Paulino; Veturi, Yogasudha; Simianer, Henner; de los Campos, Gustavo

    2015-01-01

    Summary Genome‐wide association studies (GWAS) have detected large numbers of variants associated with complex human traits and diseases. However, the proportion of variance explained by GWAS‐significant single nucleotide polymorphisms has been usually small. This brought interest in the use of whole‐genome regression (WGR) methods. However, there has been limited research on the factors that affect prediction accuracy (PA) of WGRs when applied to human data of distantly related individuals. Here, we examine, using real human genotypes and simulated phenotypes, how trait complexity, marker‐quantitative trait loci (QTL) linkage disequilibrium (LD), and the model used affect the performance of WGRs. Our results indicated that the estimated rate of missing heritability is dependent on the extent of marker‐QTL LD. However, this parameter was not greatly affected by trait complexity. Regarding PA our results indicated that: (a) under perfect marker‐QTL LD WGR can achieve moderately high prediction accuracy, and with simple genetic architectures variable selection methods outperform shrinkage procedures and (b) under imperfect marker‐QTL LD, variable selection methods can achieved reasonably good PA with simple or moderately complex genetic architectures; however, the PA of these methods deteriorated as trait complexity increases and with highly complex traits variable selection and shrinkage methods both performed poorly. This was confirmed with an analysis of human height. PMID:25600682

  14. Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm

    Science.gov (United States)

    Hejazi, Mohamad I.; Cai, Ximing

    2009-04-01

    Input variable selection (IVS) is a necessary step in modeling water resources systems. Neglecting this step may lead to unnecessary model complexity and reduced model accuracy. In this paper, we apply the minimum redundancy maximum relevance (MRMR) algorithm to identifying the most relevant set of inputs in modeling a water resources system. We further introduce two modified versions of the MRMR algorithm ( α-MRMR and β-MRMR), where α and β are correction factors that are found to increase and decrease as a power-law function, respectively, with the progress of the input selection algorithms and the increase of the number of selected input variables. We apply the proposed algorithms to 22 reservoirs in California to predict daily releases based on a set from a 121 potential input variables. Results indicate that the two proposed algorithms are good measures of model inputs as reflected in enhanced model performance. The α-MRMR and β-MRMR values exhibit strong negative correlation to model performance as depicted in lower root-mean-square-error (RMSE) values.

  15. Firefly as a novel swarm intelligence variable selection method in spectroscopy.

    Science.gov (United States)

    Goodarzi, Mohammad; dos Santos Coelho, Leandro

    2014-12-10

    A critical step in multivariate calibration is wavelength selection, which is used to build models with better prediction performance when applied to spectral data. Up to now, many feature selection techniques have been developed. Among all different types of feature selection techniques, those based on swarm intelligence optimization methodologies are more interesting since they are usually simulated based on animal and insect life behavior to, e.g., find the shortest path between a food source and their nests. This decision is made by a crowd, leading to a more robust model with less falling in local minima during the optimization cycle. This paper represents a novel feature selection approach to the selection of spectroscopic data, leading to more robust calibration models. The performance of the firefly algorithm, a swarm intelligence paradigm, was evaluated and compared with genetic algorithm and particle swarm optimization. All three techniques were coupled with partial least squares (PLS) and applied to three spectroscopic data sets. They demonstrate improved prediction results in comparison to when only a PLS model was built using all wavelengths. Results show that firefly algorithm as a novel swarm paradigm leads to a lower number of selected wavelengths while the prediction performance of built PLS stays the same.

  16. Contributions of Selected Perinatal Variables to Seven-Year Psychological and Achievement Test Scores.

    Science.gov (United States)

    Henderson, N. B.; And Others

    Perinatal variables were used to predict 7-year outcome for 538 children, 32% Negro and 68% white. Mother's age, birthplace, education, occupation, marital status, neuropsychiatric status, family income, number supported, birth weight, one- and five-minute Apgar scores were regressed on 7-year Verbal, Performance and Full Scale IQ, Bender, Wide…

  17. Cortical Response Variability as a Developmental Index of Selective Auditory Attention

    Science.gov (United States)

    Strait, Dana L.; Slater, Jessica; Abecassis, Victor; Kraus, Nina

    2014-01-01

    Attention induces synchronicity in neuronal firing for the encoding of a given stimulus at the exclusion of others. Recently, we reported decreased variability in scalp-recorded cortical evoked potentials to attended compared with ignored speech in adults. Here we aimed to determine the developmental time course for this neural index of auditory…

  18. THE SELECTION OF A MODE OF URBAN TRANSPORTATION: INTEGRATING PSYCHOLOGICAL VARIABLES TO DISCRETE CHOICE MODELS

    Directory of Open Access Journals (Sweden)

    JORGE E. CÓRDOBA MAQUILÓN

    2011-01-01

    Full Text Available Aplicando encuestas de preferencias reveladas y cuestionarios psicológicos se realizó un estudio detectando variables psicológicas claves de la conducta que intervienen en la elección de un modo de transporte en un grupo de habitantes del Área Metropolitana del Valle de Aburrá. Se tuvo en cuenta la teoría de la utilidad aleatoriapara los modelos de elección discreta y la acción razonada para evaluar las creencias y se utilice como herramienta de análisis de las variables psicológicas el cuestionario de factor de personalidad (16PF. Además de las encuestas de preferencias reveladas, se aplicaron otras dos encuestas: una de categorías socioeconómicas, y otra con indicadores latentes. Esta metodología permite una integración de modelos de elección discreta y de variables latentes, que lo hace operativo y cuantifica las variables psicológicas inobservables. El resultado más relevante que se obtuvo fue que la ansiedad incide en la elección de un modo de transporte urbano y se muestra que una alteración fisiológica, problemas en la percepción, y las creencias pueden afectar el proceso de toma de decisiones.

  19. Spatial distribution of heavy metals in Hong Kong's marine sediments and their human impacts: a GIS-based chemometric approach.

    Science.gov (United States)

    Zhou, Feng; Guo, Huaicheng; Hao, Zejia

    2007-09-01

    A geographic information system (GIS)-based chemometric approach was applied to investigate the spatial distribution patterns of heavy metals in marine sediments and to identify spatial human impacts on global and local scales. Twelve metals (Zn, V, Ni, Mn, Pb, Cu, Cd, Ba, Hg, Fe, Cr and Al) were surveyed twice annually at 59 sites in Hong Kong from 1998 to 2004. Cluster analysis classified the entire coastal area into three areas on a global scale, representing different pollution levels. Backward discriminant analysis, with 84.5% correct assignments, identified Zn, Pb, Cu, Cd, V, and Fe as significant variables affecting spatial variation on a local scale. Enrichment factors indicated that Cu, Cr, and Zn were derived from human impacts while Al, Ba, Mn, V and Fe originated from rock weathering. Principal component analysis further subdivided human impacts and their affected areas in each area, explaining 87%, 84% and 87% of the total variances, respectively. The primary anthropogenic sources in the three areas were (i) anti-fouling paint and domestic sewage; (ii) surface runoff, wastewater, vehicle emissions and marine transportation; and (iii) ship repainting, dental clinics, electronic/chemical industries and leaded fuel, respectively. Moreover, GIS-based spatial analysis facilitated chemometric methods.

  20. A primer to nutritional metabolomics by NMR spectroscopy and chemometrics

    DEFF Research Database (Denmark)

    Savorani, Francesco; Rasmussen, Morten Arendt; Mikkelsen, Mette Skau

    2013-01-01

    This paper outlines the advantages and disadvantages of using high throughput NMR metabolomics for nutritional studies with emphasis on the workflow and data analytical methods for generation of new knowledge. The paper describes one-by-one the major research activities in the interdisciplinary...... structures for multivariate pattern recognition methods and (3) NMR for providing a unique fingerprint of the lipoprotein status of the subject. For the first time in history, by combining NMR spectroscopy and chemometrics we are able to perform inductive nutritional research as a complement to the deductive...

  1. Fourier transform infrared spectroscopy and chemometrics for the characterization and discrimination of writing/photocopier paper types: Application in forensic document examinations

    Science.gov (United States)

    Kumar, Raj; Kumar, Vinay; Sharma, Vishal

    2017-01-01

    The aim of the present work is to explore the non-destructive application of ATR-FTIR technique for characterization and discrimination of paper samples which could be helpful to give forensic aid in resolving legal cases. Twenty-four types of paper brands were purchased from local market in and around Chandigarh, India. All the paper samples were subjected to ATR-FTIR analysis from 400 to 4000 cm- 1 wavenumber range. The qualitative feature and Chemometrics of the obtained spectral data are used for characterization and discrimination. Characterization is achieved by matching the peaks with standards of cellulose and inorganic fillers, a usual constituents of paper. Three different regions of IR, i.e. 400-2000 cm- 1, 2000-4000 cm- 1 and 400-4000 cm- 1 were selected for differentiation by Chemometrics analysis. The discrimination is achieved on the basis of three principal components, i.e. PC 1, PC 2 and PC 3. It is observed that maximum discrimination was procured in the wave number range of i.e. 2000-4000 cm- 1. Discriminating power was calculated on the basis of qualitative features as well, and it is found that the discrimination of paper samples was better achieved by Chemometrics analysis rather than qualitative features. The discriminating power by Chemometrics is 99.64% and which is larger as ever achieved by any group for present number of samples. The present result confirms that this study will be highly useful in forensic document examination work in the legal cases, where the authenticity of the document is challenged. The results are completely analytical and, therefore, overcome the problem encounter in traditional routine light/radiation scanning methods which are still in practice by various questioned document laboratories.

  2. Heterogeneous selection on a heritable temperament trait in a variable environment.

    Science.gov (United States)

    Quinn, John L; Patrick, Samantha C; Bouwhuis, Sandra; Wilkin, Teddy A; Sheldon, Ben C

    2009-11-01

    1. Temperament traits increasingly provide a focus for investigating the evolutionary ecology of behavioural variation. Here, we examine the underlying causes and selective consequences of individual variation in the temperament trait 'exploration behaviour in a novel environment' (EB, based on an 8-min assay) in a free-ranging population of a passerine bird, the great tit Parus major. 2. First, we conducted a quantitative genetic analysis on EB using a restricted maximum likelihood-based animal model with a long-term pedigree. Although repeatability was relatively high, EB was only moderately heritable and permanent environment (V(PE)) effects contributed as much to phenotypic variance as additive genetic effects. 3. We then asked whether heterogeneous selection acted on EB at various temporal and spatial scales. Using estimates of lifetime reproductive success, we found evidence of weak negative directional selection acting on EB amongst females which was driven by selection through recruitment, but not fecundity, in one of the four breeding years. There was no evidence of any selection on EB through survival. 4. Heterogeneous selection on EB within seasons was also observed amongst males through fecundity along two fine-scale environmental gradients--local breeding density and habitat quality; we are unaware of any previous equivalent demonstrations. 5. All of these analyses were repeated on a second measure of exploration behaviour (EB(2), measured during a 2-min assay) to facilitate comparison with other studies. EB and EB(2) were strongly correlated to one another at the genetic level, but were only moderately correlated at the phenotypic level and V(PE) was undetected in EB(2). Selection on EB(2) was similar to that on EB; we conclude that both traits are broadly equivalent from an evolutionary perspective. 6. Our analyses suggest that to the extent that the temperament trait 'exploration behaviour' is subject to natural selection in this population, this

  3. Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

    CERN Document Server

    Richards, Joseph W; Brink, Henrik; Miller, Adam A; Bloom, Joshua S; Butler, Nathaniel R; James, J Berian; Long, James P; Rice, John

    2011-01-01

    Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---i...

  4. Target selection of classical pulsating variables for space-based photometry

    Science.gov (United States)

    Plachy, E.; Molnar, L.; Szabo, R.; Kolenberg, K.; Banyai, E.

    2016-05-01

    In a few years the Kepler and TESS missions will provide ultra- precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.

  5. Target selection of classical pulsating variables for space-based photometry

    CERN Document Server

    Plachy, E; Szabó, R; Kolenberg, K; Bányai, E

    2016-01-01

    In a few years the Kepler and TESS missions will provide ultra-precise photometry for thousands of RR Lyrae and hundreds of Cepheid stars. In the extended Kepler mission all targets are proposed in the Guest Observer (GO) Program, while the TESS space telescope will work with full frame images and a ~15-16th mag brightness limit with the possibility of short cadence measurements for a limited number of pre-selected objects. This paper highlights some details of the enormous and important work of the target selection process made by the members of Working Group 7 (WG#7) of the Kepler and TESS Asteroseismic Science Consortium.

  6. Identifying market segments in consumer markets: variable selection and data interpretation

    OpenAIRE

    Tonks, D G

    2004-01-01

    Market segmentation is often articulated as being a process which displays the recognised features of classical rationalism but in part; convention, convenience, prior experience and the overarching impact of rhetoric will influence if not determine the outcomes of a segmentation exercise. Particular examples of this process are addressed critically in this paper which concentrates on the issues of variable choice for multivariate approaches to market segmentation and also the methods used fo...

  7. An alternative approach to approximate entropy threshold value (r) selection: application to heart rate variability and systolic blood pressure variability under postural challenge.

    Science.gov (United States)

    Singh, A; Saini, B S; Singh, D

    2016-05-01

    This study presents an alternative approach to approximate entropy (ApEn) threshold value (r) selection. There are two limitations of traditional ApEn algorithm: (1) the occurrence of undefined conditional probability (CPu) where no template match is found and (2) use of a crisp tolerance (radius) threshold 'r'. To overcome these limitations, CPu is substituted with optimum bias setting ɛ opt which is found by varying ɛ from (1/N - m) to 1 in the increments of 0.05, where N is the length of the series and m is the embedding dimension. Furthermore, an alternative approach for selection of r based on binning the distance values obtained by template matching to calculate ApEnbin is presented. It is observed that ApEnmax, ApEnchon and ApEnbin converge for ɛ opt = 0.6 in 50 realizations (n = 50) of random number series of N = 300. Similar analysis suggests ɛ opt = 0.65 and ɛ opt = 0.45 for 50 realizations each of fractional Brownian motion and MIX(P) series (Lu et al. in J Clin Monit Comput 22(1):23-29, 2008). ɛ opt = 0.5 is suggested for heart rate variability (HRV) and systolic blood pressure variability (SBPV) signals obtained from 50 young healthy subjects under supine and upright position. It is observed that (1) ApEnbin of HRV is lower than SBPV, (2) ApEnbin of HRV increases from supine to upright due to vagal inhibition and (3) ApEnbin of BPV decreases from supine to upright due to sympathetic activation. Moreover, merit of ApEnbin is that it provides an alternative to the cumbersome ApEnmax procedure.

  8. X-ray spectral variability of LINERs selected from the Palomar sample

    CERN Document Server

    Hernández-García, L; Masegosa, J; Márquez, I

    2014-01-01

    Variability is a general property of active galactic nuclei (AGN). At X-rays, the way in which these changes occur is not yet clear. In the particular case of low ionisation nuclear emission line region (LINER) nuclei, variations on months/years timescales have been found for some objects, but the main driver of these changes is still an open question. The main purpose of this work is to investigate the X-ray variability in LINERs, including the main driver of such variations, and to search for eventual differences between type 1 and 2 objects. We use the 18 LINERs in the Palomar sample with data retrieved from Chandra and/or XMM-Newton archives corresponding to observations gathered at different epochs. All the spectra for the same object are simultaneously fitted in order to study long term variations. The nature of the variability patterns are studied allowing different parameters to vary during the spectral fit. Whenever possible, short term variations from the analysis of the light curves and UV variabil...

  9. COPD phenotypes on computed tomography and its correlation with selected lung function variables in severe patients

    Directory of Open Access Journals (Sweden)

    da Silva SMD

    2016-03-01

    Full Text Available Silvia Maria Doria da Silva, Ilma Aparecida Paschoal, Eduardo Mello De Capitani, Marcos Mello Moreira, Luciana Campanatti Palhares, Mônica Corso PereiraPneumology Service, Department of Internal Medicine, School of Medical Sciences, State University of Campinas (UNICAMP, Campinas, São Paulo, BrazilBackground: Computed tomography (CT phenotypic characterization helps in understanding the clinical diversity of chronic obstructive pulmonary disease (COPD patients, but its clinical relevance and its relationship with functional features are not clarified. Volumetric capnography (VC uses the principle of gas washout and analyzes the pattern of CO2 elimination as a function of expired volume. The main variables analyzed were end-tidal concentration of carbon dioxide (ETCO2, Slope of phase 2 (Slp2, and Slope of phase 3 (Slp3 of capnogram, the curve which represents the total amount of CO2 eliminated by the lungs during each breath.Objective: To investigate, in a group of patients with severe COPD, if the phenotypic analysis by CT could identify different subsets of patients, and if there was an association of CT findings and functional variables.Subjects and methods: Sixty-five patients with COPD Gold III–IV were admitted for clinical evaluation, high-resolution CT, and functional evaluation (spirometry, 6-minute walk test [6MWT], and VC. The presence and profusion of tomography findings were evaluated, and later, the patients were identified as having emphysema (EMP or airway disease (AWD phenotype. EMP and AWD groups were compared; tomography findings scores were evaluated versus spirometric, 6MWT, and VC variables.Results: Bronchiectasis was found in 33.8% and peribronchial thickening in 69.2% of the 65 patients. Structural findings of airways had no significant correlation with spirometric variables. Air trapping and EMP were strongly correlated with VC variables, but in opposite directions. There was some overlap between the EMP and AWD

  10. Characterization of Machine Variability and Progressive Heat Treatment in Selective Laser Melting of Inconel 718

    Science.gov (United States)

    Prater, Tracie; Tilson, Will; Jones, Zack

    2015-01-01

    The absence of an economy of scale in spaceflight hardware makes additive manufacturing an immensely attractive option for propulsion components. As additive manufacturing techniques are increasingly adopted by government and industry to produce propulsion hardware in human-rated systems, significant development efforts are needed to establish these methods as reliable alternatives to conventional subtractive manufacturing. One of the critical challenges facing powder bed fusion techniques in this application is variability between machines used to perform builds. Even with implementation of robust process controls, it is possible for two machines operating at identical parameters with equivalent base materials to produce specimens with slightly different material properties. The machine variability study presented here evaluates 60 specimens of identical geometry built using the same parameters. 30 samples were produced on machine 1 (M1) and the other 30 samples were built on machine 2 (M2). Each of the 30-sample sets were further subdivided into three subsets (with 10 specimens in each subset) to assess the effect of progressive heat treatment on machine variability. The three categories for post-processing were: stress relief, stress relief followed by hot isostatic press (HIP), and stress relief followed by HIP followed by heat treatment per AMS 5664. Each specimen (a round, smooth tensile) was mechanically tested per ASTM E8. Two formal statistical techniques, hypothesis testing for equivalency of means and one-way analysis of variance (ANOVA), were applied to characterize the impact of machine variability and heat treatment on six material properties: tensile stress, yield stress, modulus of elasticity, fracture elongation, and reduction of area. This work represents the type of development effort that is critical as NASA, academia, and the industrial base work collaboratively to establish a path to certification for additively manufactured parts. For future

  11. GENOTYPIC VARIABILITY ESTIMATES OF AGRONOMIC TRAITS FOR SELECTION IN A SWEETPOTATO (IPOMOEA BATATAS POLYCROSS POPULATION IN PAPUA NEW GUINEA

    Directory of Open Access Journals (Sweden)

    Boney Wera

    2015-07-01

    Full Text Available Successful crop breeding program incorporating agronomic and consumer preferred traits can be achieved by recognizing the existence and degree of variability among sweetpotato (Ipomoea batatas, (L. Lam. genotypes. Understanding genetic variability, genotypic and phenotypic correlation and inheritance among agronomic traits is fundamental to improvement of any crop. The study was carried out with the objective to estimate the genotypic variability and other yield related traits of highlands sweetpotato in Papua New Guinea in a polycross population. A total of 8 genotypes of sweetpotato derived from the polycross were considered in two cycles of replicated field experiments. Analysis of Variance was computed to contrast the variability within the selected genotypes based on high yielding β-carotene rich orange-fleshed sweetpotato. The results revealed significant differences among the genotypes. Genotypic coefficient of variation (GCV % was lower than phenotypic coefficient of variation (PCV % for all traits studied. Relatively high genetic variance, along with high heritability and expected genetic advances were observed in NMTN and ABYield. Harvest index (HI, scab and gall mite damage scores had heritability of 67%, 66% and 37% respectively. Marketable tuber yield (MTYield and total tuber yield (TTYield had lower genetic variance, low heritability and low genetic advance. There is need to investigate correlated inheritance among these traits. Selecting directly for yield improvement in polycross population may not be very efficient as indicated by the results. Therefore, it can be conclude that the variability within sweetpotato genotypes collected from polycross population in Aiyura Research Station for tuber yield is low and the extent of its yield improvement is narrow.

  12. The Relationship between Selected Body Composition Variables and Muscular Endurance in Women

    Science.gov (United States)

    Esco, Michael R.; Olson, Michele S.; Williford, Henry N.

    2010-01-01

    The primary purpose of this study was to determine if muscular endurance is affected by referenced waist circumference groupings, independent of body mass and subcutaneous abdominal fat, in women. This study also explored whether selected body composition measures were associated with muscular endurance. Eighty-four women were measured for height,…

  13. Heterogeneous selection on a heritable temperament trait in a variable environment

    NARCIS (Netherlands)

    Quinn, John L.; Patrick, Samantha C.; Bouwhuis, Sandra; Wilkin, Teddy A.; Sheldon, Ben C.

    2009-01-01

    P> Temperament traits increasingly provide a focus for investigating the evolutionary ecology of behavioural variation. Here, we examine the underlying causes and selective consequences of individual variation in the temperament trait 'exploration behaviour in a novel environment' (EB, based on an

  14. Empirically Driven Variable Selection for the Estimation of Causal Effects with Observational Data

    Science.gov (United States)

    Keller, Bryan; Chen, Jianshen

    2016-01-01

    Observational studies are common in educational research, where subjects self-select or are otherwise non-randomly assigned to different interventions (e.g., educational programs, grade retention, special education). Unbiased estimation of a causal effect with observational data depends crucially on the assumption of ignorability, which specifies…

  15. An Investigation of the Relation Between the Developmental Parabolic Curve and Selected Personality Variables.

    Science.gov (United States)

    Flugsrud, Marcia R.

    This study is designed to determine whether data obtained cross-sectionally from a sample of subjects in the middle childhood range on selected personality characteristics could be well described by a concave parabolic curve and thus linked to the closure behaviour elicited from the subjects. Specifically, the investigation seeks to determine if…

  16. Heterogeneous selection on a heritable temperament trait in a variable environment

    NARCIS (Netherlands)

    Quinn, John L.; Patrick, Samantha C.; Bouwhuis, Sandra; Wilkin, Teddy A.; Sheldon, Ben C.

    2009-01-01

    P> Temperament traits increasingly provide a focus for investigating the evolutionary ecology of behavioural variation. Here, we examine the underlying causes and selective consequences of individual variation in the temperament trait 'exploration behaviour in a novel environment' (EB, based on an 8

  17. The relationship between mineral contents, particle matter and bottom ash distribution during pellet combustion: molar balance and chemometric analysis.

    Science.gov (United States)

    Jeguirim, Mejdi; Kraiem, Nesrine; Lajili, Marzouk; Guizani, Chamseddine; Zorpas, Antonis; Leva, Yann; Michelin, Laure; Josien, Ludovic; Limousy, Lionel

    2017-03-21

    This paper aims to identify the correlation between the mineral contents in agropellets and particle matter and bottom ash characteristics during combustion in domestic boilers. Four agrifood residues with higher mineral contents, namely grape marc (GM), tomato waste (TW), exhausted olive mill solid waste (EOMSW) and olive mill wastewater (OMWW), were selected. Then, seven different pellets were produced from pure residues or their mixture and blending with sawdust. The physico-chemical properties of the produced pellets were analysed using different analytical techniques, and a particular attention was paid to their mineral contents. Combustion tests were performed in 12-kW domestic boiler. The particle matter (PM) emission was characterised through the particle number and mass quantification for different particle size. The bottom ash composition and size distribution were also characterised. Molar balance and chemometric analyses were performed to identify the correlation between the mineral contents and PM and bottom ash characteristics. The performed analyses indicate that K, Na, S and Cl are released partially or completely during combustion tests. In contrast, Ca, Mg, Si, P, Al, Fe and Mn are retained in the bottom ash. The chemometric analyses indicate that, in addition to the operating conditions and the pellet ash contents, K and Si concentrations have a significant effect on the PM emissions as well as on the agglomeration of bottom ash.

  18. Chemometric evaluation of near infrared, fourier transform infrared, and Raman spectroscopic models for the prediction of nimodipine polymorphs.

    Science.gov (United States)

    Siddiqui, Akhtar; Rahman, Ziyaur; Sayeed, Vilayat A; Khan, Mansoor A

    2013-11-01

    The objective of this study was to assess the performance of the chemometric model to predict the proportion of the recrystallized polymorphs of nimodipine from the cosolvent formulations. Ranging from 100% to 0% (w/w) of polymorph I, the two polymorphs mixtures were prepared and characterized spectroscopically using Fourier transformed infrared spectroscopy (FTIR), near-infrared spectroscopy (NIR), and Raman spectroscopy. Instrumental responses were treated to construct multivariate calibration model using principal component regression (PCR) and partial least square regression approaches. Treated data showed better model fitting than without treatment, which demonstrated higher correlation coefficient (R(2) ) and lower root mean square of standard error (RMSE) and standard error (SE). Multiple scattering correction and standard normal variate exhibited higher R(2) and lower RMSE and SE values than second derivative. Goodness of fit for FTIR and NIR (R(2) ∼ 0.99) data was better than Raman (R(2) ∼ 0.95). Furthermore, the models were applied on the recrystallized polymorphs obtained by storing nimodipine-cosolvent formulations at selected stability conditions. The relative composition of the polymorphs differed with storage conditions. NIR-chemical imaging on recrystallized sample of nimodipine at 15°C qualitatively corroborated the model-based prediction of the two polymorphs. Therefore, these studies strongly suggest the importance of the potential utility of the chemometric model in predicting nimodipine polymorphs.

  19. NMR and Chemometric Characterization of Vacuum Residues and Vacuum Gas Oils from Crude Oils of Different Origin

    Directory of Open Access Journals (Sweden)

    Jelena Parlov Vuković

    2015-03-01

    Full Text Available NMR spectroscopy in combination with statistical methods was used to study vacuum residues and vacuum gas oils from 32 crude oils of different origin. Two chemometric metodes were applied. Firstly, principal component analysis on complete spectra was used to perform classification of samples and clear distinction between vacuum residues and vacuum light and heavy gas oils were obtained. To quantitatively predict the composition of asphaltenes, principal component regression models using areas of resonance signals spaned by 11 frequency bins of the 1H NMR spectra were build. The first 5 principal components accounted for more than 94 % of variations in the input data set and coefficient of determination for correlation between measured and predicted values was R2 = 0.7421. Although this value is not significant, it shows the underlying linear dependence in the data. Pseudo two-dimensional DOSY NMR experiments were used to assess the composition and structural properties of asphaltenes in a selected crude oil and its vacuum residue on the basis of their different hydrodynamic behavior and translational diffusion coefficients. DOSY spectra showed the presence of several asphaltene aggregates differing in size and interactions they formed. The obtained results have shown that NMR techniques in combination with chemometrics are very useful to analyze vacuum residues and vacuum gas oils. Furthermore, we expect that our ongoing investigation of asphaltenes from crude oils of different origin will elucidate in more details composition, structure and properties of these complex molecular systems.

  20. Use of Vis/NIRS for the determination of sugar content of cola soft drinks based on chemometric methods

    Science.gov (United States)

    Liu, Fei; He, Yong

    2008-03-01

    Three different chemometric methods were performed for the determination of sugar content of cola soft drinks using visible and near infrared spectroscopy (Vis/NIRS). Four varieties of colas were prepared and 180 samples (45 samples for each variety) were selected for the calibration set, while 60 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay, standard normal variate (SNV) and Savitzky-Golay first derivative transformation were applied for the pre-processing of spectral data. The first eleven principal components (PCs) extracted by partial least squares (PLS) analysis were employed as the inputs of BP neural network (BPNN) and least squares-support vector machine (LS-SVM) model. Then the BPNN model with the optimal structural parameters and LS-SVM model with radial basis function (RBF) kernel were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias for prediction were 0.971, 1.259 and -0.335 for PLS, 0.986, 0.763, and -0.042 for BPNN, while 0.978, 0.995 and -0.227 for LS-SVM, respectively. All the three methods supplied a high and satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be utilized as a high precision way for the determination of sugar content of cola soft drinks.

  1. Pharmacophore identification by molecular modeling and chemometrics: The case of HMG-CoA reductase inhibitors

    Science.gov (United States)

    Cosentino, U.; Moro, G.; Pitea, D.; Scolastico, S.; Todeschini, R.; Scolastico, C.

    1992-02-01

    A methodology based on molecular modeling and chemometrics is applied to identify the geometrical pharmacophore and the stereoelectronic requirements for the activity in a series of inhibitors of 3-hydroxy 3-methylglutaryl coenzyme A (HMG-CoA) reductase, an enzyme involved in cholesterol biosynthesis. These inhibitors present two common structural features—a 3,5-dihydroxy heptanoic acid which mimics the active portion of the natural substrate HMG-CoA and a lipophilic region which carries both polar and bulky groups. A total of 432 minimum energy conformations of 11 homologous compounds showing different levels of biological activity are calculated by the molecular mechanics MM2 method. Five atoms are selected as representatives of the relevant fragments of these compounds and three interatomic distances, selected among 10 by means of a Principal Component Analysis (PCA), are used to describe the three-dimensional disposition of these atoms. A cluster analysis procedure, performed on the whole set of conformations described by these three distances, allows the selection of one cluster whose centroid represents a geometrical model for the HMG-CoA reductase pharmacophore and the conformations included are candidates as binding conformations. To obtain a refinement of the geometrical model and to have a better insight into the requirements for the activity of these inhibitors, the Molecular Electrostatic Potential (MEP) distributions are determined by the MNDO semiempirical method.

  2. Application of chemometric methods to differential scanning calorimeter (DSC) to estimate nimodipine polymorphs from cosolvent system.

    Science.gov (United States)

    Siddiqui, Akhtar; Rahman, Ziyaur; Khan, Mansoor A

    2015-06-01

    The focus of this study was to evaluate the applicability of chemometrics to differential scanning calorimetry data (DSC) to evaluate nimodipine polymorphs. Multivariate calibration models were built using DSC data from known mixtures of the nimodipine modification. The linear baseline correction treatment of data was used to reduce dispersion in thermograms. Principal component analysis of the treated and untreated data explained 96% and 89% of the data variability, respectively. Score and loading plots correlated variability between samples with change in proportion of nimodipine modifications. The R(2) for principal component regression (PCR) and partial lease square regression (PLS) were found to be 0.91 and 0.92. The root mean square of standard error of the treated samples for calibration and validation in PCR and PLS was found to be lower than the untreated sample. These models were applied to samples recrystallized from a cosolvent system, which indicated different proportion of modifications in the mixtures than those obtained by placing samples under different storage conditions. The model was able to predict the nimodipine modifications with known margin of error. Therefore, these models can be used as a quality control tool to expediently determine the nimodipine modification in an unknown mixture.

  3. Chemometric techniques in oil classification from oil spill fingerprinting.

    Science.gov (United States)

    Ismail, Azimah; Toriman, Mohd Ekhwan; Juahir, Hafizan; Kassim, Azlina Md; Zain, Sharifuddin Md; Ahmad, Wan Kamaruzaman Wan; Wong, Kok Fah; Retnam, Ananthy; Zali, Munirah Abdul; Mokhtar, Mazlin; Yusri, Mohd Ayub

    2016-10-15

    Extended use of GC-FID and GC-MS in oil spill fingerprinting and matching is significantly important for oil classification from the oil spill sources collected from various areas of Peninsular Malaysia and Sabah (East Malaysia). Oil spill fingerprinting from GC-FID and GC-MS coupled with chemometric techniques (discriminant analysis and principal component analysis) is used as a diagnostic tool to classify the types of oil polluting the water. Clustering and discrimination of oil spill compounds in the water from the actual site of oil spill events are divided into four groups viz. diesel, Heavy Fuel Oil (HFO), Mixture Oil containing Light Fuel Oil (MOLFO) and Waste Oil (WO) according to the similarity of their intrinsic chemical properties. Principal component analysis (PCA) demonstrates that diesel, HFO, MOLFO and WO are types of oil or oil products from complex oil mixtures with a total variance of 85.34% and are identified with various anthropogenic activities related to either intentional releasing of oil or accidental discharge of oil into the environment. Our results show that the use of chemometric techniques is significant in providing independent validation for classifying the types of spilled oil in the investigation of oil spill pollution in Malaysia. This, in consequence would result in cost and time saving in identification of the oil spill sources.

  4. The relationship between selected variables and customer loyalty within an optometric practice environment

    Directory of Open Access Journals (Sweden)

    T. Van Vuuren

    2012-12-01

    Full Text Available Purpose: The purpose of the research that informed this article was to examine the relationship between customer satisfaction, trust, supplier image, commitment and customer loyalty within an optometric practice environment. Problem investigated: Optometric businesses need to adopt their strategies to enhance loyalty, as customer satisfaction is not enough to ensure loyalty and customer retention. An understanding of the variables influencing loyalty could help businesses within the optometric service environment to retain their customers and become more profitable. Methodology: The methodological approach followed was exploratory and quantitative in nature. The sample consisted of 357 customers who visited the practice twice or more over the previous six years. A structured questionnaire, with a five-point Likert scale, was fielded to gather the data. The descriptive and multiple regression analysis approach was used to analyse the results. Collinearity statistics and Pearson's correlation coefficient were also calculated to determine which independent variable has the largest influence on customer loyalty. Findings and implications: The main finding is that customer satisfaction had the highest correlation with customer loyalty. The other independent variables, however, also appear to significantly influence customer loyalty within an optometric practice environment. The implication is that optometric practices need to focus on customer satisfaction, trust, supplier image and commitment when addressing the improvement of customer loyalty. Originality and value of the research: The article contributes to the improvement of customer loyalty within a service business environment that could assist in facilitating larger market share, higher customer retention and greater profitability for the business over the long term.

  5. Unimodal transform of variables selected by interval segmentation purity for classification tree modeling of high-dimensional microarray data.

    Science.gov (United States)

    Du, Wen; Gu, Ting; Tang, Li-Juan; Jiang, Jian-Hui; Wu, Hai-Long; Shen, Guo-Li; Yu, Ru-Qin

    2011-09-15

    As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.

  6. The effect of aquatic plyometric training with and without resistance on selected physical fitness variables among volleyball players

    Directory of Open Access Journals (Sweden)

    K. KAMALAKKANNAN

    2011-06-01

    Full Text Available The purpose of this study is to analyze the effect of aquatic plyometric training with and without the use ofweights on selected physical fitness variables among volleyball players. To achieve the purpose of these study 36physically active undergraduate volleyball players between 18 and 20 years of age volunteered as participants.The participants were randomly categorized into three groups of 12 each: a control group (CG, an aquaticPlyometric training with weight group (APTWG, and an aquatic Plyometric training without weight group(APTWOG. The subjects of the control group were not exposed to any training. Both experimental groupsunderwent their respective experimental treatment for 12 weeks, 3 days per week and a single session on eachday. Speed, endurance, and explosive power were measured as the dependent variables for this study. 36 days ofexperimental treatment was conducted for all the groups and pre and post data was collected. The collected datawere analyzed using an analysis of covariance (ANCOVA and followed by a Scheffé’s post hoc test. The resultsrevealed significant differences between groups on all the selected dependent variables. This study demonstratedthat aquatic plyometric training can be one effective means for improving speed, endurance, and explosivepower in volley ball players

  7. Selected topics in the classical theory of functions of a complex variable

    CERN Document Server

    Heins, Maurice

    2014-01-01

    Elegant and concise, this text is geared toward advanced undergraduate students acquainted with the theory of functions of a complex variable. The treatment presents such students with a number of important topics from the theory of analytic functions that may be addressed without erecting an elaborate superstructure. These include some of the theory's most celebrated results, which seldom find their way into a first course. After a series of preliminaries, the text discusses properties of meromorphic functions, the Picard theorem, and harmonic and subharmonic functions. Subsequent topics incl

  8. Effect of Integrated Yoga Module on Selected Psychological Variables among Women with Anxiety Problem.

    Science.gov (United States)

    Parthasarathy, S; Jaiganesh, K; Duraisamy

    2014-01-01

    The implementation of yogic practices has proven benefits in both organic and psychological diseases. Forty-five women with anxiety selected by a random sampling method were divided into three groups. Experimental group I was subjected to asanas, relaxation and pranayama while Experimental group II was subjected to an integrated yoga module. The control group did not receive any intervention. Anxiety was measured by Taylor's Manifest Anxiety Scale before and after treatment. Frustration was measured through Reaction to Frustration Scale. All data were spread in an Excel sheet to be analysed with SPSS 16 software using analysis of covariance (ANCOVA). Selected yoga and asanas decreased anxiety and frustration scores but treatment with an integrated yoga module resulted in significant reduction of anxiety and frustration. To conclude, the practice of asanas and yoga decreased anxiety in women, and yoga as an integrated module significantly improved anxiety scores in young women with proven anxiety without any ill effects.

  9. The influence of selected socio-demographic variables on symptoms occurring during the menopause

    OpenAIRE

    Marta Makara-Studzińska; Karolina Kryś-Noszczyką; Grzegorz Jakiel

    2015-01-01

    Introduction: It is considered that the lifestyle conditioned by socio-demographic or socio-economic factors determines the health condition of people to the greatest extent. The aim of this study is to evaluate the influence of selected socio-demographic factors on the kinds of symptoms occurring during menopause. Material and methods : The study group consisted of 210 women aged 45 to 65, not using hormone replacement therapy, staying at healthcare centers for rehabilitation treatment...

  10. Neuronal Intra-Individual Variability Masks Response Selection Differences between ADHD Subtypes—A Need to Change Perspectives

    Directory of Open Access Journals (Sweden)

    Annet Bluschke

    2017-06-01

    Full Text Available Due to the high intra-individual variability in attention deficit/hyperactivity disorder (ADHD, there may be considerable bias in knowledge about altered neurophysiological processes underlying executive dysfunctions in patients with different ADHD subtypes. When aiming to establish dimensional cognitive-neurophysiological constructs representing symptoms of ADHD as suggested by the initiative for Research Domain Criteria, it is crucial to consider such processes independent of variability. We examined patients with the predominantly inattentive subtype (attention deficit disorder, ADD and the combined subtype of ADHD (ADHD-C in a flanker task measuring conflict control. Groups were matched for task performance. Besides using classic event-related potential (ERP techniques and source localization, neurophysiological data was also analyzed using residue iteration decomposition (RIDE to statistically account for intra-individual variability and S-LORETA to estimate the sources of the activations. The analysis of classic ERPs related to conflict monitoring revealed no differences between patients with ADD and ADHD-C. When individual variability was accounted for, clear differences became apparent in the RIDE C-cluster (analog to the P3 ERP-component. While patients with ADD distinguished between compatible and incompatible flanker trials early on, patients with ADHD-C seemed to employ more cognitive resources overall. These differences are reflected in inferior parietal areas. The study demonstrates differences in neuronal mechanisms related to response selection processes between ADD and ADHD-C which, according to source localization, arise from the inferior parietal cortex. Importantly, these differences could only be detected when accounting for intra-individual variability. The results imply that it is very likely that differences in neurophysiological processes between ADHD subtypes are underestimated and have not been recognized because intra

  11. Photometric study of selected cataclysmic variables II. Time-series photometry of nine systems

    CERN Document Server

    Papadaki, C; Stanishev, V; Boumis, P; Akras, S; Sterken, C

    2008-01-01

    We present time-series photometry of nine cataclysmic variables: EI UMa, V844Her, V751 Cyg, V516 Cyg, GZ Cnc, TY Psc, V1315 Aql, ASAS J002511+1217.12, V1315 Aql and LN UMa. The observations were conducted at various observatories, covering 170 hours and comprising 7,850 data points in total. For the majority of targets we confirm previously reported periodicities and for some of them we give, for the first time, their spectroscopic orbital periods. For those dwarf-nova systems which we observed during both quiescence and outburst, the increase in brightness was followed by a decrease in the amount of flickering. Quasi-periodic oscillations have either been discovered, or were confirmed. For the eclipsing system V1315 Aql we have covered 9 eclipses, and obtained a refined orbital ephemeris. We find that, during its long baseline of observations, no change in the orbital period of this system has occurred. V1315 Aql also shows eclipses of variable depth.

  12. Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis

    KAUST Repository

    Bhadra, Anindya

    2013-04-22

    We describe a Bayesian technique to (a) perform a sparse joint selection of significant predictor variables and significant inverse covariance matrix elements of the response variables in a high-dimensional linear Gaussian sparse seemingly unrelated regression (SSUR) setting and (b) perform an association analysis between the high-dimensional sets of predictors and responses in such a setting. To search the high-dimensional model space, where both the number of predictors and the number of possibly correlated responses can be larger than the sample size, we demonstrate that a marginalization-based collapsed Gibbs sampler, in combination with spike and slab type of priors, offers a computationally feasible and efficient solution. As an example, we apply our method to an expression quantitative trait loci (eQTL) analysis on publicly available single nucleotide polymorphism (SNP) and gene expression data for humans where the primary interest lies in finding the significant associations between the sets of SNPs and possibly correlated genetic transcripts. Our method also allows for inference on the sparse interaction network of the transcripts (response variables) after accounting for the effect of the SNPs (predictor variables). We exploit properties of Gaussian graphical models to make statements concerning conditional independence of the responses. Our method compares favorably to existing Bayesian approaches developed for this purpose. © 2013, The International Biometric Society.

  13. [The significance of selected psychopathological and personality variables in the course of allergic and non-allergic asthma].

    Science.gov (United States)

    Czyż, Piotr; Furgał, Mariusz; Nowobilski, Roman; de Barbaro, Bogdan; Pulka, Grażyna

    2014-01-01

    The aim of this study was to carry out a comparative analysis of selected psychopathological and personality variables in patients with allergic and non-allergic asthma, as well as an attempt to determine the significance and strength of these variables in the clinical picture of both forms of the disease. In all patients structured anamnesis, basic spirometry, and dyspnea measure- ment were carried out. The level of anxiety was determined using Spielberger's questionnaire. The intensity of depression was evaluated with Beck's Inventory. Neuroticism and extroversion-introversion were assessed by Eysenck's Inventory. The I-E scale was used to determine the perception of the locus of control. The lack of significant differences in the area ofpsychopathological and personality variables was found between the two types of asthma. The gender differentiated patients with respect to psychopathology. The intensity of extroversion correlated with the duration of the disease. In the case of neuroticism, the clinical form of the disease was associated with blurring the differences between genders. The intensity of dyspnea and the spirometric results correlated with the psychological background of the disease. No significant differences in the area of psychopathology and personality dimensions between the groups of patients with allergic and non-allergic asthma were found although psychological variables are associated with the course of asthma in adults.

  14. Variable selection methods in PLS regression - a comparison study on metabolomics data

    DEFF Research Database (Denmark)

    Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach

    Partial least squares regression (PLSR) has been applied to various fields such as psychometrics, consumer science, econometrics and process control. Recently it has been applied to metabolomics based data sets (GC/LC-MS, NMR) and proven to be a very powerful in situations with many variables...... for the purpose of reducing over-fitting problems and providing useful interpretation tools. It has excellent possibilities for giving a graphical overview of sample and variation patterns. It can handle co-linearity in an efficient way and make it possible to use different highly correlated data sets in one...... Integrating Omics data. Statistical Applications in Genetics and Molecular Biology, 7:Article 35, 2008. 2. Martens H and Martens M. Modifed Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food Quality and Preference, 11:5-16, 2000....

  15. Forecasting macroeconomic variables using neural network models and three automated model selection techniques

    DEFF Research Database (Denmark)

    Kock, Anders Bredahl; Teräsvirta, Timo

    2016-01-01

    When forecasting with neural network models one faces several problems, all of which influence the accuracy of the forecasts. First, neural networks are often hard to estimate due to their highly nonlinear structure. To alleviate the problem, White (2006) presented a solution (Quick......Net) that converts the specification and nonlinear estimation problem into a linear model selection and estimation problem. We shall compare its performance to that of two other procedures building on the linearization idea: the Marginal Bridge Estimator and Autometrics. Second, one must decide whether forecasting...

  16. A DNA-based system for selecting and displaying the combined result of two input variables

    DEFF Research Database (Denmark)

    Liu, Huajie; Wang, Jianbang; Song, S

    2015-01-01

    Oligonucleotide-based technologies for biosensing or bio-regulation produce huge amounts of rich high-dimensional information. There is a consequent need for flexible means to combine diverse pieces of such information to form useful derivative outputs, and to display those immediately. Here we...... demonstrate this capability in a DNA-based system that takes two input numbers, represented in DNA strands, and returns the result of their multiplication, writing this as a number in a display. Unlike a conventional calculator, this system operates by selecting the result from a library of solutions rather...

  17. Conflict Management Styles of Selected Managers and Their Relationship With Management and Organization Variables

    Directory of Open Access Journals (Sweden)

    Concepcion Martires

    1990-12-01

    Full Text Available This study sought to determine the relationship between the conflict management styles of managers and certain management and organization factors. A total of 462 top, middle, and lower managers from 72 companies participated in the study which utilized the Thomas-Killman Conflict Mode Instrument. To facilitate the computation of the statistical data, a microcomputer and a software package was used.The majority of the managers of the 17 types of organization included in the study use collaborative mode of managing conflict. This finding is congruent with the findings of past studies conducted on managers of commercial banks, service, manufacturing, trading advertising, appliance, investment houses, and overseas recruitment industries showing their high degree of objectivity and assertiveness of their own personal goals and of other people's concerns. The second dominant style, which is compromising, indicates their desire in sharing and searching for solutions that result in satisfaction among conflicting parties. This finding is highly consistent with the strong Filipino value of smooth interpersonal relationships (SIR as reflected and discussed in the numerous researches on Filipino values.The chi-square tests generated by the computer package in statistics showed independence between the manager's conflict management styles and each of the variables of sex, civil status, position level at work, work experience, type of corporation, and number of subordinates. This result is again congruent with those of past studies conducted in the Philippines. The past and present findings may imply that conflict management mode may be a highly personal style that is not dependent on any of these variables included in the study. However, the chi-square tests show that management style is dependent on the manager's age and educational attainment.

  18. Chemometric analysis of ligand receptor complementarity: identifying Complementary Ligands Based on Receptor Information (CoLiBRI).

    Science.gov (United States)

    Oloff, Scott; Zhang, Shuxing; Sukumar, Nagamani; Breneman, Curt; Tropsha, Alexander

    2006-01-01

    We have developed a novel structure-based chemoinformatics approach to search for Complimentary Ligands Based on Receptor Information (CoLiBRI). CoLiBRI is based on the representation of both receptor binding sites and their respective ligands in a space of universal chemical descriptors. The binding site atoms involved in the interaction with ligands are identified by the means of a computational geometry technique known as Delaunay tessellation as applied to X-ray characterized ligand-receptor complexes. TAE/RECON multiple chemical descriptors are calculated independently for each ligand as well as for its active site atoms. The representation of both ligands and active sites using chemical descriptors allows the application of well-known chemometric techniques in order to correlate chemical similarities between active sites and their respective ligands. We have established a protocol to map patterns of nearest neighbor active site vectors in a multidimensional TAE/RECON space onto those of their complementary ligands and vice versa. This protocol affords the prediction of a virtual complementary ligand vector in the ligand chemical space from the position of a known active site vector. This prediction is followed by chemical similarity calculations between this virtual ligand vector and those calculated for molecules in a chemical database to identify real compounds most similar to the virtual ligand. Consequently, the knowledge of the receptor active site structure affords straightforward and efficient identification of its complementary ligands in large databases of chemical compounds using rapid chemical similarity searches. Conversely, starting from the ligand chemical structure, one may identify possible complementary receptor cavities as well. We have applied the CoLiBRI approach to a data set of 800 X-ray characterized ligand-receptor complexes in the PDBbind database. Using a k nearest neighbor (kNN) pattern recognition approach and variable selection

  19. Genetic variability and selection for laticiferous system characters in Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Paulo de Souza Gonçalves

    2005-09-01

    Full Text Available Six laticiferous system characters were investigated in 22 three-year-old, half-sib rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss. Muell.-Arg.] progenies, evaluated at three sites (Votuporanga, Pindorama and Jaú, all in the São Paulo State, Brazil. The traits examined were: average rubber yield (Pp, average bark thickness (Bt, number of latex vessel rings (Lv, average distance between consecutive latex vessel rings (Dc, density of latex vessels per 5 mm per ring averaged over all rings (Dd and the diameter of the latex vessels (Di. The joint analysis showed that site effect and progeny x sites interaction were significant for all traits, except Lv. Estimates of individual heritabilities across the three sites were high for Bt; moderate for Lv, Pp and Dc; low for Dd and very low for Di. Genetic correlations in the joint analysis showed high positive correlations between Pp and the other traits. Selecting the best five progenies would result in genetic gains of 24.91% for Pp while selecting best two plants within a progeny would result in a Pp genetic gain of 30.98%.

  20. Chemometric classification techniques as a tool for solving problems in analytical chemistry.

    Science.gov (United States)

    Bevilacqua, Marta; Nescatelli, Riccardo; Bucci, Remo; Magrì, Andrea D; Magrì, Antonio L; Marini, Federico

    2014-01-01

    Supervised pattern recognition (classification) techniques, i.e., the family of chemometric methods whose aim is the prediction of a qualitative response on a set of samples, represent a very important assortment of tools for solving problems in several areas of applied analytical chemistry. This paper describes the theory behind the chemometric classification techniques most frequently used in analytical chemistry together with some examples of their application to real-world problems.

  1. Joint variable and rank selection for parsimonious estimation of high dimensional matrices

    CERN Document Server

    Bunea, Florentina; Wegkamp, Marten

    2011-01-01

    This article is devoted to optimal dimension reduction methods for sparse, high dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by considering them jointly. For this, we first motivate a new class of sparse multivariate regression models, in which the coefficient matrix has low rank {\\bf and} zero rows or can be well approximated by such a matrix. Then, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. We support our theoretica...

  2. SEASONAL VARIABILITY OF SELECTED NUTRIENTS IN THE WATERS OF LAKES NIEPRUSZEWSKIE, PAMIATKOWSKIE AND STRYKOWSKIE

    Directory of Open Access Journals (Sweden)

    Anna Zbierska

    2016-09-01

    Full Text Available The paper presents the evaluation of seasonal and long-term changes in selected nutrients of three lakes of the Poznań Lakeland. The lakes were selected due to the high risk of pollution from agricultural and residential areas. Water samples were taken in 6 control points in the spring, summer and autumn, from 2004 to 2014. Trophic status of the lakes was evaluated based on the concentration of nutrients (nitrates, nitrites, ammonium, nitrogen and phosphorus and indicators of eutrophication. Studies have shown that the concentration of nutrients varied greatly both in individual years and seasons of the analyzed decades, especially in Lakes Niepruszewskie and Pamiątkowskie. The main problem is the high concentration of nitrates. In general, it showed an upward trend until 2013, especially in the spring. This may indicate that actions restricting runoff pollution from agricultural sources have not been fully effective. On the other hand, a marked downward trend in the concentrations of NH4 over the years from 2004 to 2014, especially after 2007, indicates a gradual improvement of wastewater management. Moreover, seasonal variation in NH4 concentrations differed from those of NO3 and NO2. The highest values were reported in the autumn season, the lowest in the summer. Concentrations of nutrients and eutrophication indexes reached high values in all analysed lakes, indicating a eutrophic or hypertrophic state of the lakes. The high value of the N:P ratio indicates that the lakes had a huge surplus of nitrogen, and phosphorus is a productivity limiting factor.

  3. Effects of preparation variables of supported-cobalt catalysts on the selective hydrogenation of. alpha. ,. beta. -unsaturated aldehydes

    Energy Technology Data Exchange (ETDEWEB)

    Nitta, Yuriko; Hiramatsu, Yoshifumi; Imanaka, Toshinobu (Osaka Univ., Toyonaka (Japan))

    1990-11-01

    The effects of starting salts, supports, added amount of Na{sub 2}CO{sub 3}, and other precipitation variables on catalytic properties of supported cobalt catalysts were studied for the hydrogenation of cinnamaldehyde and crotonaldehyde by using TGA, XRD, and XPS. The catalysts prepared from cobalt chloride always exhibited high selectivities to unsaturated alcohols irrespective of the support employed. The amount of surface chlorine remaining after H{sub 2}-reduction of the Co/SiO{sub 2} precursors prepared from cobalt chloride decreased with increasing amount of Na{sub 2}CO{sub 3} added as the precipitant, and both activity and selectivity reached maxima at around Cl/Co = 0.2 in the catalyst surface. The enhanced selectivity of the catalyst prepared from cobalt chloride was explained by the effects of residual chlorine both in the H{sub 2}-reduction stage and in the reaction stage; the former leads to a favorable crystallite size distribution (CDS) of cobalt and the latter depresses the hydrogenation of C{double bond}C double bond. The difference in activities and selectivities of various supported catalysts prepared from cobalt nitrate was discussed based on the difference in the strength of metal-support interaction which leads to different CDSs of cobalt in theses catalysts.

  4. Simultaneous optimization of variables influencing selectivity and elution strength in micellar liquid chromatography. Effect of organic modifier and micelle concentration.

    Science.gov (United States)

    Strasters, J K; Breyer, E D; Rodgers, A H; Khaledi, M G

    1990-07-06

    Previously, the simultaneous enhancement of separation selectivity with elution strength was reported in micellar liquid chromatography (MLC) using the hybrid eluents of water-organic solvent-micelles. The practical implication of this phenomenon is that better separations can be achieved in shorter analysis times by using the hybrid eluents. Since both micelle concentration and volume fraction of organic modifier influence selectivity and solvent strength, only an investigation of the effects of a simultaneous variation of these parameters will disclose the full separation capability of the method, i.e. the commonly used sequential solvent optimization approach of adjusting the solvent strength first and then improving selectivity in reversed-phase liquid chromatography is inefficient for the case of MLC with the hybrid eluents. This is illustrated in this paper with two examples: the optimization of the selectivity in the separation of a mixture of phenols and the optimization of a resolution-based criterion determined for the separation of a number of amino acids and small peptides. The large number of variables involved in the separation process in MLC necessitates a structured approach in the development of practical applications of this technique. A regular change in retention behavior is observed with the variation of the surfactant concentration and the concentration of organic modifier, which enables a successful prediction of retention times. Consequently interpretive optimization strategies such as the interative regression method are applicable.

  5. The Relationships Between Selected Organizational Variables and ATM Technology Adoption in Campus Networking

    Science.gov (United States)

    Yao, Engui

    1998-06-01

    ATM (Asynchronous Transfer Mode) is an emerging technology in computer networking, which, in turn, is the physical media of information systems and networking/telecommunication systems. The technology provides potentiality for universities to build their networks based on the future vision of uniting voice, data, and video communications on ATM-technology-based equipment. A review of the literature revealed that minimal evidence exists to indicate whether the size, type, financial factors, and information processing maturity of a university affect a university's high-tech innovation adoptions. No research of this nature has been undertaken in the study of ATM adoption in any institutions of higher learning, nor has any research of this nature been found in other organizations, either. Such evidence is needed by university administrators, information systems managers, and LAN managers to understand their universities better, whether they have or have not adopted ATM, and to evaluate their current administrative, academic, and financial situations and current campus networking situations. The purpose of this study was to determine the relationships between ATM adoption and four organizational variables: university size, type, finances, and information processing maturity. Another purpose of the study was to identify the current status of ATM adoption in campus networking in the United States. Logistic regression was used as the statistical data analysis method. The results of the study provided evidence to show that ATM adoption in campus networking is significantly related to university size, university type, university finances, and university information processing maturity.

  6. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

    Science.gov (United States)

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-11-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

  7. Spatial and temporal variability of microbes in selected soils at the Nevada Test Site

    Energy Technology Data Exchange (ETDEWEB)

    Angerer, J.P.; Winkel, V.K.; Ostler, W.K.; Hall, P.F.

    1993-12-31

    Large areas encompassing almost 800 hectares on the Nevada Test Site, Nellis Air Force Range and the Tonopah Test Range are contaminated with plutonium. Decontamination of plutonium from these sites may involve removal of plants and almost 370,000 cubic meters of soil. The soil may be subjected to a series of processes to remove plutonium. After decontamination, the soils will be returned to the site and revegetated. There is a paucity of information on the spatial and temporal distribution of microbes in soils of the Mojave and Great Basin Deserts. Therefore, this study was initiated to determine the biomass and diversity of microbes in soils prior to decontamination. Soils were collected to a depth of 10 cm along each of five randomly located 30-m transects at each of four sites. To ascertain spatial differences, soils were collected from beneath major shrubs and from associated interspaces. Soils were collected every three to four months to determine temporal (seasonal) differences in microbial parameters. Soils from beneath shrubs generally had greater active fungi and bacteria, and greater non-amended respiration than soils from interspaces. Temporal variability also was found; total and active fungi, and non-amended respiration were correlated with soil moisture at the time of sampling. Information from this study will aid in determining the effects of plutonium decontamination on soil microorganisms, and what measures, if any, will be required to restore microbial populations during revegetation of these sites.

  8. Microwave-assisted of dispersive liquid-liquid microextraction and spectrophotometric determination of uranium after optimization based on Box-Behnken design and chemometrics methods

    Science.gov (United States)

    Niazi, Ali; Khorshidi, Neda; Ghaemmaghami, Pegah

    2015-01-01

    In this study an analytical procedure based on microwave-assisted dispersive liquid-liquid microextraction (MA-DLLME) and spectrophotometric coupled with chemometrics methods is proposed to determine uranium. In the proposed method, 4-(2-pyridylazo) resorcinol (PAR) is used as a chelating agent, and chloroform and ethanol are selected as extraction and dispersive solvent. The optimization strategy is carried out by using two level full factorial designs. Results of the two level full factorial design (24) based on an analysis of variance demonstrated that the pH, concentration of PAR, amount of dispersive and extraction solvents are statistically significant. Optimal condition for three variables: pH, concentration of PAR, amount of dispersive and extraction solvents are obtained by using Box-Behnken design. Under the optimum conditions, the calibration graphs are linear in the range of 20.0-350.0 ng mL-1 with detection limit of 6.7 ng mL-1 (3δB/slope) and the enrichment factor of this method for uranium reached at 135. The relative standard deviation (R.S.D.) is 1.64% (n = 7, c = 50 ng mL-1). The partial least squares (PLS) modeling was used for multivariate calibration of the spectrophotometric data. The orthogonal signal correction (OSC) was used for preprocessing of data matrices and the prediction results of model, with and without using OSC, were statistically compared. MA-DLLME-OSC-PLS method was presented for the first time in this study. The root mean squares error of prediction (RMSEP) for uranium determination using PLS and OSC-PLS models were 4.63 and 0.98, respectively. This procedure allows the determination of uranium synthesis and real samples such as waste water with good reliability of the determination.

  9. Assessing saffron (Crocus sativus L.) adulteration with plant-derived adulterants by diffuse reflectance infrared Fourier transform spectroscopy coupled with chemometrics.

    Science.gov (United States)

    Petrakis, Eleftherios A; Polissiou, Moschos G

    2017-01-01

    Saffron, the dried red stigmas of the plant Crocus sativus L., is well-known as one of the most important and expensive spices worldwide. It is thus highly susceptible to fraudulent practices that employ, among others, plant-derived adulterants. This study presents an application of diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) and chemometric techniques for evaluating adulteration of saffron with six characteristic adulterants of plant origin, i.e. C. sativus stamens, calendula, safflower, turmeric, buddleja, and gardenia. The proposed method involved a three-step process for the detection of adulteration as well as for the identification and quantification of adulterants. Partial least squares discriminant analysis (PLS-DA) was applied to perform authentication of saffron based on mid-infrared fingerprints (4000-600cm(-1)), resulting in 99% correct classification of pure saffron and saffron adulterated at 5-20% (w/w) levels. Adulterant identification in positive samples was performed with high sensitivity and specificity by a six-class PLS-DA model, with spectroscopic data from the region 2000-600cm(-1). Subsequently, partial least squares (PLS) regression models were built for the quantification of each adulterant. By using synergy interval PLS (siPLS) for variable selection, models with improved performance were developed, with detection limits ranging from 1.0% to 3.1% (w/w). The results obtained illustrate that this strategy based on DRIFTS has the potential to complement existing methodologies for the rapid and cost-effective assessment of typical saffron frauds.

  10. Microwave-assisted of dispersive liquid-liquid microextraction and spectrophotometric determination of uranium after optimization based on Box-Behnken design and chemometrics methods.

    Science.gov (United States)

    Niazi, Ali; Khorshidi, Neda; Ghaemmaghami, Pegah

    2015-01-25

    In this study an analytical procedure based on microwave-assisted dispersive liquid-liquid microextraction (MA-DLLME) and spectrophotometric coupled with chemometrics methods is proposed to determine uranium. In the proposed method, 4-(2-pyridylazo) resorcinol (PAR) is used as a chelating agent, and chloroform and ethanol are selected as extraction and dispersive solvent. The optimization strategy is carried out by using two level full factorial designs. Results of the two level full factorial design (2(4)) based on an analysis of variance demonstrated that the pH, concentration of PAR, amount of dispersive and extraction solvents are statistically significant. Optimal condition for three variables: pH, concentration of PAR, amount of dispersive and extraction solvents are obtained by using Box-Behnken design. Under the optimum conditions, the calibration graphs are linear in the range of 20.0-350.0 ng mL(-1) with detection limit of 6.7 ng mL(-1) (3δB/slope) and the enrichment factor of this method for uranium reached at 135. The relative standard deviation (R.S.D.) is 1.64% (n=7, c=50 ng mL(-1)). The partial least squares (PLS) modeling was used for multivariate calibration of the spectrophotometric data. The orthogonal signal correction (OSC) was used for preprocessing of data matrices and the prediction results of model, with and without using OSC, were statistically compared. MA-DLLME-OSC-PLS method was presented for the first time in this study. The root mean squares error of prediction (RMSEP) for uranium determination using PLS and OSC-PLS models were 4.63 and 0.98, respectively. This procedure allows the determination of uranium synthesis and real samples such as waste water with good reliability of the determination. Copyright © 2014. Published by Elsevier B.V.

  11. The influence of selected socio-demographic variables on symptoms occurring during the menopause.

    Science.gov (United States)

    Makara-Studzińska, Marta; Kryś-Noszczyka, Karolina; Jakiel, Grzegorz

    2015-03-01

    It is considered that the lifestyle conditioned by socio-demographic or socio-economic factors determines the health condition of people to the greatest extent. The aim of this study is to evaluate the influence of selected socio-demographic factors on the kinds of symptoms occurring during menopause. The study group consisted of 210 women aged 45 to 65, not using hormone replacement therapy, staying at healthcare centers for rehabilitation treatment. The study was carried out in 2013-2014 in the Silesian, Podlaskie and Lesser Poland voivodeships. The set of tools consisted of the authors' own survey questionnaire and the Menopause Rating Scale (MRS). The most commonly occurring symptom in the group of studied women was a depressive mood, from the group of psychological symptoms, followed by physical and mental fatigue, and discomfort connected with muscle and joint pain. The greatest intensity of symptoms was observed in the group of women with the lowest level of education, reporting an average or bad material situation, and unemployed women. An alarmingly high number of reported psychological symptoms in the group of menopausal women was observed, and in particular among the group of low socio-economic status. Career seems to be a factor reducing the risk of occurrence of psychological symptoms. There is an urgent need for health promotion and prophylaxis in the group of menopausal women, and in many cases for implementation of specialist psychological assistance.

  12. Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection

    Science.gov (United States)

    Hu, Ming; Qin, Zhaohui S.

    2009-01-01

    In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors. PMID:19214232

  13. Status of police officers with regard to selected cardio-respiratory and body compositional fitness variables.

    Science.gov (United States)

    Stamford, B A; Weltman, A; Moffatt, R J; Fulco, C

    1978-01-01

    Physical performance and body composition characteristics of members (n = 75) and recruits (n = 61) of the Louisville Police Department (total n = 136) were assessed. Members were randomly selected males and ranged in age from 20 to 55 years and were ranked from the newest inductee through and including the Chief of Police. Members between the ages of 20 and 29 years assigned to active duty possessed average cardio-respiratory fitness (Vo2max). With age, cardio-respiratory fitness decreased and body weight and body fatness progressively increased. Male and female recruits entering basic training also demonstrated average cardio-respiratory fitness. Significant (P less than .05) increases for males and females in Vo2max and decreases in body fatness (males) were found following 4 months of physically rigorous recruit training. Fifteen of the male recruits who completed training were retested following 1 year of active duty. During active duty, physical activity involvement was limited to job requirements with no additional physical training imposed. Cardio-respiratory fitness and body fatness reverted to pre-training levels. It was concluded that the physical demands associated with police work are too low to permit maintenance of physical fitness.

  14. The influence of selected socio-demographic variables on symptoms occurring during the menopause

    Directory of Open Access Journals (Sweden)

    Marta Makara-Studzińska

    2015-02-01

    Full Text Available Introduction: It is considered that the lifestyle conditioned by socio-demographic or socio-economic factors determines the health condition of people to the greatest extent. The aim of this study is to evaluate the influence of selected socio-demographic factors on the kinds of symptoms occurring during menopause. Material and methods : The study group consisted of 210 women aged 45 to 65, not using hormone replacement therapy, staying at healthcare centers for rehabilitation treatment. The study was carried out in 2013-2014 in the Silesian, Podlaskie and Lesser Poland voivodeships. The set of tools consisted of the authors’ own survey questionnaire and the Menopause Rating Scale (MRS. Results : The most commonly occurring symptom in the group of studied women was a depressive mood, from the group of psychological symptoms, followed by physical and mental fatigue, and discomfort connected with muscle and joint pain. The greatest intensity of symptoms was observed in the group of women with the lowest level of education, reporting an average or bad material situation, and unemployed women. Conclusions : An alarmingly high number of reported psychological symptoms in the group of menopausal women was observed, and in particular among the group of low socio-economic status. Career seems to be a factor reducing the risk of occurrence of psychological symptoms. There is an urgent need for health promotion and prophylaxis in the group of menopausal women, and in many cases for implementation of specialist psychological assistance.

  15. Variability in prefrontal hemodynamic response during exposure to repeated self-selected music excerpts, a near-infrared spectroscopy study.

    Science.gov (United States)

    Moghimi, Saba; Schudlo, Larissa; Chau, Tom; Guerguerian, Anne-Marie

    2015-01-01

    Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.

  16. Adaptation Strategies to Combating Climate Variability and Extremity among Farmers in Selected Farm Settlements in Oyo State, Nigeria

    Directory of Open Access Journals (Sweden)

    BOROKINI T.I

    2014-09-01

    Full Text Available The adverse effects of climate variability and extremities on agriculture in Africa have been widely reported. This calls for adaptive strategies in farming so as to reduce vulnerability and ensure food security. This study was therefore conducted to evaluate the awareness of farmers to climate variability and their adaptation strategies in four selected farm settlements in Oyo State, Nigeria. . Structured questionnaires were administered to 120 farmers using a stratified random sampling method. The results showed very high awareness of climate variability among the farmers. However, majority of the farmers acquired their land by lease, while local farm tools are still used by most of the farmers. Sole cropping, mixed cropping and crop rotation were mostly practiced by the farmers. The farmers reported prevalence of crops pests and diseases, flooding, disappearance of bi-modal rainfall, increased temperature and drought in their farmlands, leading to increase in poverty, higher production costs and poor crop harvests as evidences of harsh climatic conditions. Adaptation strategies used by the farmers were changing planting dates, planting new varieties, intercropping and alternative income generating activities. The farmers are encouraged to acquire more efficient farming system and equipment, while they should strongly consider other adaptation strategies such as agricultural insurance, agroforestry, water conservation methods, soil conservation farming, irrigation farming, organic farming and mechanized farming. Furthermore, land tenure policy that could constrain the farmers should be reviewed, while they should be given proper training.

  17. A conceptual framework for selecting the most appropriate variables for measuring hospital efficiency with a focus on Iranian public hospitals.

    Science.gov (United States)

    Afzali, Hossein Haji Ali; Moss, John R; Mahmood, Mohammad Afzal

    2009-05-01

    Over the past few decades, there has been an increasing interest in the measurement of hospital efficiency in developing countries and in Iran. While the choice of measurement methods in hospital efficiency assessment has been widely argued in the literature, few authors have offered a framework to specify variables that reflect different hospital functions, the quality of the process of care and the effectiveness of hospital services. However, without the knowledge of hospital objectives and all relevant functions, efficiency studies run the risk of making biased comparisons, particularly against hospitals that provide higher quality services requiring the use of more resources. Undertaking an in-depth investigation regarding the multi-product nature of hospitals, various hospital functions and the values of various stakeholders (patient, staff and community) with a focus on the Iranian public hospitals, this study has proposed a conceptual framework to select the most appropriate variables for measuring hospital efficiency using frontier-based techniques. This paper contributes to hospital efficiency studies by proposing a conceptual framework and incorporating a broader set of variables in Iran. This can enhance the validity of hospital efficiency studies using frontier-based methods in developing countries.

  18. Habitat Heterogeneity Variably Influences Habitat Selection by Wild Herbivores in a Semi-Arid Tropical Savanna Ecosystem

    Science.gov (United States)

    Muposhi, Victor K.; Gandiwa, Edson; Chemura, Abel; Bartels, Paul; Makuza, Stanley M.; Madiri, Tinaapi H.

    2016-01-01

    An understanding of the habitat selection patterns by wild herbivores is critical for adaptive management, particularly towards ecosystem management and wildlife conservation in semi arid savanna ecosystems. We tested the following predictions: (i) surface water availability, habitat quality and human presence have a strong influence on the spatial distribution of wild herbivores in the dry season, (ii) habitat suitability for large herbivores would be higher compared to medium-sized herbivores in the dry season, and (iii) spatial extent of suitable habitats for wild herbivores will be different between years, i.e., 2006 and 2010, in Matetsi Safari Area, Zimbabwe. MaxEnt modeling was done to determine the habitat suitability of large herbivores and medium-sized herbivores. MaxEnt modeling of habitat suitability for large herbivores using the environmental variables was successful for the selected species in 2006 and 2010, except for elephant (Loxodonta africana) for the year 2010. Overall, large herbivores probability of occurrence was mostly influenced by distance from rivers. Distance from roads influenced much of the variability in the probability of occurrence of medium-sized herbivores. The overall predicted area for large and medium-sized herbivores was not different. Large herbivores may not necessarily utilize larger habitat patches over medium-sized herbivores due to the habitat homogenizing effect of water provisioning. Effect of surface water availability, proximity to riverine ecosystems and roads on habitat suitability of large and medium-sized herbivores in the dry season was highly variable thus could change from one year to another. We recommend adaptive management initiatives aimed at ensuring dynamic water supply in protected areas through temporal closure and or opening of water points to promote heterogeneity of wildlife habitats. PMID:27680673

  19. A volatolomic approach for studying plant variability: the case of selected Helichrysum species (Asteraceae).

    Science.gov (United States)

    Giuliani, Claudia; Lazzaro, Lorenzo; Calamassi, Roberto; Calamai, Luca; Romoli, Riccardo; Fico, Gelsomina; Foggi, Bruno; Mariotti Lippi, Marta

    2016-10-01

    The species of Helichrysum sect. Stoechadina (Asteraceae) are well-known for their secondary metabolite content and the characteristic aromatic bouquets. In the wild, populations exhibit a wide phenotypic plasticity which makes critical the circumscription of species and infraspecific ranks. Previous investigations on Helichrysum italicum complex focused on a possible phytochemical typification based on hydrodistilled essential oils. Aims of this paper are three-fold: (i) characterizing the volatile profiles of different populations, testing (ii) how these profiles vary across populations and (iii) how the phytochemical diversity may contribute in solving taxonomic problems. Nine selected Helichrysum populations, included within the H. italicum complex, Helichrysum litoreum and Helichrysum stoechas, were investigated. H. stoechas was chosen as outgroup for validating the method. After collection in the wild, plants were cultivated in standard growing conditions for over one year. Annual leafy shoots were screened in the post-blooming period for the emissions of volatile organic compounds (VOCs) by means of headspace solid phase microextraction coupled with gas-chromatography and mass spectrometry (HS-SPME-GC/MS). The VOC composition analysis revealed the production of overall 386 different compounds, with terpenes being the most represented compound class. Statistical data processing allowed the identification of the indicator compounds that differentiate the single populations, revealing the influence of the geographical provenance area in determining the volatile profiles. These results suggested the potential use of VOCs as valuable diacritical characters in discriminating the Helichrysum populations. In addition, the cross-validation analysis hinted the potentiality of this volatolomic study in the discrimination of the Helichrysum species and subspecies, highlighting a general congruence with the current taxonomic treatment of the genus. The consistency

  20. Chemometrics approach to substrate development, case: semisyntetic cheese

    DEFF Research Database (Denmark)

    Nielsen, Per Væggemose; Hansen, Birgitte Vedel

    1998-01-01

    In several cases a well defined, robust and easy reproducible substrate that meets specific requirements is needed. This is the case in studies of fungal growth and metabolism on specific products as affected by environmental conditions or processing factors, or isolation of product specific fungi...... from food production facilities.The Chemometrics approach to substrate development is illustrated by the development of a semisyntetic cheese substrate. Growth, colour formation and mycotoxin production of 6 cheese related fungi were studied on 9 types of natural cheeses and 24 synthetic cheese...... substrates and compared using principal component analysis (PCA). The synthetic cheese substrates contained various amounts of Ca, K, Mg, Na, P, Fe, Cu, Zn, lactate, lactose and casein. In this manner a robust, well-defined and easy prepared laboratory cheese substrate was developed for Penicillium commune...

  1. Chemometrics approach to substrate development, case: semisyntetic cheese

    DEFF Research Database (Denmark)

    Nielsen, Per Væggemose; Hansen, Birgitte Vedel

    1998-01-01

    from food production facilities.The Chemometrics approach to substrate development is illustrated by the development of a semisyntetic cheese substrate. Growth, colour formation and mycotoxin production of 6 cheese related fungi were studied on 9 types of natural cheeses and 24 synthetic cheese......, the most frequently occurring contaminant on semi-hard cheese. Growth experiments on the substrate were repeatable and reproducible. The substrate was also suitable for the starter P. camemberti. Mineral elements in cheese were shown to have strong effect on growth, mycotoxin production and colour...... formation of fungi. For P. roqueforti, P. discolor, P. verrucosum and Aspergillus versicolor the substrate was less suitable as a model cheese substrate, which indicates great variation in nutritional demands of the fungi. Substrates suitable for studies of specific cheese types was found for P. roqueforti...

  2. A study of adulteration in gasoline samples using flame emission spectroscopy and chemometrics tools.

    Science.gov (United States)

    de Paulo, Jaqueline M; Mendes, Gisele; Barros, José E M; Barbeira, Paulo J S

    2012-12-21

    This work presents a low cost system based on Flame Emission Spectroscopy (FES) that enables the prediction of fuel adulteration. The spectral data acquired using FES were associated with chemometric tools--Partial Least Squares Discriminant Analysis (PLS-DA) and Partial Least Squares (PLS), aiming to predict gasoline adulterations with different solvents. The classification of the Brazilian adulterated gasoline samples with turpentine, thinner, kerosene, rubber solvent and ethanol was carried out through a PLS-DA model built using five latent variables (LV) with an accumulated variance of 100% on X and 76.78% on Y. The combination of these techniques provided the discrimination of distinct groups for each one of the studied adulterants. Subsequent to the classification, samples of adulterated gasoline with the same solvents with contents varying from 1 to 50% (v/v) were analyzed through FES and multivariate calibration curves were employed in order to predict the contents of the respective solvents. The results obtained by the combination of FES and PLS provided the determination of gasoline adulterants with small calibration and validation errors and also lower values than the ones reported in the literature using other spectroscopic techniques.

  3. Identification and Quantitation of Melamine in Milk by Near-Infrared Spectroscopy and Chemometrics

    Directory of Open Access Journals (Sweden)

    Tong Wu

    2016-01-01

    Full Text Available Melamine is a nitrogen-rich substance and has been illegally used to increase the apparent protein content in food products such as milk. Therefore, it is imperative to develop sensitive and reliable analytical methods to determine melamine in human foods. Current analytical methods for melamine are mainly chromatography-based methods, which are time-consuming and expensive and require complex pretreatment and well-trained technicians. The present paper investigated the feasibility of using near-infrared (NIR spectroscopy and chemometrics for identifying and quantifying melamine in liquor milk. A total of 75 samples were prepared. Uninformative variable elimination-partial least square (UVE-PLS and partial least squares-discriminant analysis (PLS-DA were used to construct quantitative and qualitative models, respectively. Based on the ratio of performance to standard deviate (RPD, UVE-PLS model with 3 components resulted in a better solution. The PLS-DA model achieved an accuracy of 100% and outperformed the optimal reference model of soft independent modeling of class analogy (SIMCA. Such a method can serve as a potential tool for rapid screening of melamine in milk products.

  4. A comparative chemometric study for water quality expertise of the Athenian water reservoirs.

    Science.gov (United States)

    Farmaki, Eleni G; Thomaidis, Nikolaos S; Simeonov, Vasil; Efstathiou, Constantinos E

    2012-12-01

    The aim of the present study is to compare the application of unsupervised and supervised pattern recognition techniques for the quality assessment and classification of the reservoirs used as the source for the domestic and industrial water supply of the city of Athens, Greece. A new optimization strategy for sampling, monitoring, and water management is proposed. During the period of October 2006 to April 2007, 89 samples were collected from the three water reservoirs (Iliki, Mornos, and Marathon), and 13 parameters (metals and metalloids) were analytically determined. Generally, all the elements were found to fluctuate at very low levels, especially for Mornos that comprises the main water reservoir of Athens. Iliki and Marathon showed relatively elevated values, compared to Mornos, but below the legislative limits. Multivariate unsupervised statistical techniques, such as factor analysis/principal components analysis, and cluster analysis and supervised ones, like discriminant analysis and classification trees, were applied to the data set, and their classification abilities were compared. All the chemometric techniques successfully revealed the critical variables and described the similarities and dissimilarities among the sampling points, emphasizing the individual characteristics in every sample and revealing the sources of elements in the region. New data from posterior samplings (November and December 2007) were used for the validation of the supervised techniques. Finally, water management strategies were proposed concerning the sampling points and representative parameters.

  5. Targeted and non-targeted detection of lemon juice adulteration by LC-MS and chemometrics.

    Science.gov (United States)

    Wang, Zhengfang; Jablonski, Joseph E

    2016-01-01

    Economically motivated adulteration (EMA) of lemon juice was detected by LC-MS and principal component analysis (PCA). Twenty-two batches of freshly squeezed lemon juice were adulterated by adding an aqueous solution containing 5% citric acid and 6% sucrose to pure lemon juice to obtain 30%, 60% and 100% lemon juice samples. Their total titratable acidities, °Brix and pH values were measured, and then all the lemon juice samples were subject to LC-MS analysis. Concentrations of hesperidin and eriocitrin, major phenolic components of lemon juice, were quantified. The PCA score plots for LC-MS datasets were used to preview the classification of pure and adulterated lemon juice samples. Results showed a large inherent variability in the chemical properties among 22 batches of 100% lemon juice samples. Measurement or quantitation of one or several chemical properties (targeted detection) was not effective in detecting lemon juice adulteration. However, by using the LC-MS datasets, including both chromatographic and mass spectrometric information, 100% lemon juice samples were successfully differentiated from adulterated samples containing 30% lemon juice in the PCA score plot. LC-MS coupled with chemometric analysis can be a complement to existing methods for detecting juice adulteration.

  6. Chemometric study of Andalusian extra virgin olive oils Raman spectra: Qualitative and quantitative information.

    Science.gov (United States)

    Sánchez-López, E; Sánchez-Rodríguez, M I; Marinas, A; Marinas, J M; Urbano, F J; Caridad, J M; Moalem, M

    2016-08-15

    Authentication of extra virgin olive oil (EVOO) is an important topic for olive oil industry. The fraudulent practices in this sector are a major problem affecting both producers and consumers. This study analyzes the capability of FT-Raman combined with chemometric treatments of prediction of the fatty acid contents (quantitative information), using gas chromatography as the reference technique, and classification of diverse EVOOs as a function of the harvest year, olive variety, geographical origin and Andalusian PDO (qualitative information). The optimal number of PLS components that summarizes the spectral information was introduced progressively. For the estimation of the fatty acid composition, the lowest error (both in fitting and prediction) corresponded to MUFA, followed by SAFA and PUFA though such errors were close to zero in all cases. As regards the qualitative variables, discriminant analysis allowed a correct classification of 94.3%, 84.0%, 89.0% and 86.6% of samples for harvest year, olive variety, geographical origin and PDO, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification.

    Science.gov (United States)

    Li, Hong-Dong; Xu, Qing-Song; Liang, Yi-Zeng

    2012-08-31

    The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).

  8. An adaptive technique for multiscale approximate entropy (MAEbin) threshold (r) selection: application to heart rate variability (HRV) and systolic blood pressure variability (SBPV) under postural stress.

    Science.gov (United States)

    Singh, Amritpal; Saini, Barjinder Singh; Singh, Dilbag

    2016-06-01

    Multiscale approximate entropy (MAE) is used to quantify the complexity of a time series as a function of time scale τ. Approximate entropy (ApEn) tolerance threshold selection 'r' is based on either: (1) arbitrary selection in the recommended range (0.1-0.25) times standard deviation of time series (2) or finding maximum ApEn (ApEnmax) i.e., the point where self-matches start to prevail over other matches and choosing the corresponding 'r' (rmax) as threshold (3) or computing rchon by empirically finding the relation between rmax, SD1/SD2 ratio and N using curve fitting, where, SD1 and SD2 are short-term and long-term variability of a time series respectively. None of these methods is gold standard for selection of 'r'. In our previous study [1], an adaptive procedure for selection of 'r' is proposed for approximate entropy (ApEn). In this paper, this is extended to multiple time scales using MAEbin and multiscale cross-MAEbin (XMAEbin). We applied this to simulations i.e. 50 realizations (n = 50) of random number series, fractional Brownian motion (fBm) and MIX (P) [1] series of data length of N = 300 and short term recordings of HRV and SBPV performed under postural stress from supine to standing. MAEbin and XMAEbin analysis was performed on laboratory recorded data of 50 healthy young subjects experiencing postural stress from supine to upright. The study showed that (i) ApEnbin of HRV is more than SBPV in supine position but is lower than SBPV in upright position (ii) ApEnbin of HRV decreases from supine i.e. 1.7324 ± 0.112 (mean ± SD) to upright 1.4916 ± 0.108 due to vagal inhibition (iii) ApEnbin of SBPV increases from supine i.e. 1.5535 ± 0.098 to upright i.e. 1.6241 ± 0.101 due sympathetic activation (iv) individual and cross complexities of RRi and systolic blood pressure (SBP) series depend on time scale under consideration (v) XMAEbin calculated using ApEnmax is correlated with cross-MAE calculated using ApEn (0.1-0.26) in steps of 0

  9. Study of the structure-activity relationship for theoretical molecular descriptors using density functional theory and chemometric methods in cannabinoid metabolites

    Science.gov (United States)

    Silva, Tânia B. E.; Pereira, Mariano A.; Malta, Valéria S.; Bento, Edson S.; San-Miguel, Miguel A.; Ziolli, Roberta L.; Martins, João B. L.; Sih, Andre; Taft, Carlton A.

    A set of 30 cannabinoid metabolites has been investigated from a combination of electronic and chemometric methods. Density functional calculations have been carried out to obtain optimized geometries, energies, and selected molecular properties. These molecular descriptors take into account steric effects, electronic properties, and chemical reactivity. The use of statistical methods including principal component analysis (PCA), hierarchical cluster analysis (HCA) and nonhierarchical cluster analysis (K-means), nearest neighbor (KNN) and artificial neural networks (ANN) has enabled to classify the compounds into psychoactive, moderately psychoactive and psychoinactive groups in good agreement with experimental evidences.

  10. Application of chemometric analysis based on physicochemical and chromatographic data for the differentiation origin of plant protection products containing chlorpyrifos.

    Science.gov (United States)

    Miszczyk, Marek; Płonka, Marlena; Bober, Katarzyna; Dołowy, Małgorzata; Pyka, Alina; Pszczolińska, Klaudia

    2015-01-01

    The aim of this study was to investigate the similarities and dissimilarities between the pesticide samples in form of emulsifiable concentrates (EC) formulation containing chlorpyrifos as active ingredient coming from different sources (i.e., shops and wholesales) and also belonging to various series. The results obtained by the Headspace Gas Chromatography-Mass Spectrometry method and also some selected physicochemical properties of examined pesticides including pH, density, stability, active ingredient and water content in pesticides tested were compared using two chemometric methods. Applicability of simple cluster analysis and also principal component analysis of obtained data in differentiation of examined plant protection products coming from different sources was confirmed. It would be advantageous in the routine control of originality and also in the detection of counterfeit pesticides, respectively, among commercially available pesticides containing chlorpyrifos as an active ingredient.

  11. Chemometric expertise of the quality of groundwater sources for domestic use.

    Science.gov (United States)

    Spanos, Thomas; Ene, Antoaneta; Simeonova, Pavlina

    2015-01-01

    In the present study 49 representative sites have been selected for the collection of water samples from central water supplies with different geographical locations in the region of Kavala, Northern Greece. Ten physicochemical parameters (pH, electric conductivity, nitrate, chloride, sodium, potassium, total alkalinity, total hardness, bicarbonate and calcium) were analyzed monthly, in the period from January 2010 to December 2010. Chemometric methods were used for monitoring data mining and interpretation (cluster analysis, principal components analysis and source apportioning by principal components regression). The clustering of the chemical indicators delivers two major clusters related to the water hardness and the mineral components (impacted by sea, bedrock and acidity factors). The sampling locations are separated into three major clusters corresponding to the spatial distribution of the sites - coastal, lowland and semi-mountainous. The principal components analysis reveals two latent factors responsible for the data structures, which are also an indication for the sources determining the groundwater quality of the region (conditionally named "mineral" factor and "water hardness" factor). By the apportionment approach it is shown what the contribution is of each of the identified sources to the formation of the total concentration of each one of the chemical parameters. The mean values of the studied physicochemical parameters were found to be within the limits given in the 98/83/EC Directive. The water samples are appropriate for human consumption. The results of this study provide an overview of the hydrogeological profile of water supply system for the studied area.

  12. Tracing the origin of beer samples by NMR and chemometrics: Trappist beers as a case study.

    Science.gov (United States)

    Mannina, Luisa; Marini, Federico; Antiochia, Riccarda; Cesa, Stefania; Magrì, Antonio; Capitani, Donatella; Sobolev, Anatoly P

    2016-10-01

    An NMR and chemometric analytical approach to classify beers according to their brand identity was developed within the European TRACE project (FP6-2003-FOOD-2-A, contract number: 0060942). Rochefort 8 Trappist beers (47 samples), other Trappist beers (76 samples) and non-Trappist beers (110 samples) were analyzed by (1) H NMR spectroscopy. Selected NMR signals were measured and used to build classification models. Three different classification problems were identified, namely Trappist versus non-Trappist, Rochefort versus Non-Rochefort, and Rochefort 8 versus non-Rochefort 8. In all the three cases, both a discriminant and a modeling approaches were followed, using partial least squares discriminant analysis (PLS-DA) and soft independent modeling of class analogies (SIMCA), respectively, leading to very high classification accuracy as evaluated by external validation. Information regarding chemical composition was also obtained: Trappist beers contain a higher amount of formic and pyruvic acids and a lower amount of acetic acid and alanine with respect to non-Trappist ones. Rochefort beers turned out to have also a higher content of propanol and isopentanol with respect to non-Rochefort samples. Finally, Rochefort 8, shows the highest content of pyruvic acid and the lowest content of gallic, fumaric, acetic acids, adenosine, uridine, 2-phenylethanol, GABA, and alanine.

  13. Discrimination of Brazilian propolis according to the seasoning using chemometrics and machine learning based on UV-Vis scanning data.

    Science.gov (United States)

    Tomazzoli, Maíra Maciel; Pai Neto, Remi Dal; Moresco, Rodolfo; Westphal, Larissa; Zeggio, Amélia Regina Somensi; Specht, Leandro; Costa, Christopher; Rocha, Miguel; Maraschin, Marcelo

    2015-10-21

    Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plant's resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis' chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds (λ = 280-400ηm), suggesting that besides the biological activities of those

  14. Relation of desert pupfish abundance to selected environmental variables in natural and manmade habitats in the Salton Sea basin

    Science.gov (United States)

    Martin, B.A.; Saiki, M.K.

    2005-01-01

    We assessed the relation between abundance of desert pupfish, Cyprinodon macularius, and selected biological and physicochemical variables in natural and manmade habitats within the Salton Sea Basin. Field sampling in a natural tributary, Salt Creek, and three agricultural drains captured eight species including pupfish (1.1% of the total catch), the only native species encountered. According to Bray-Curtis resemblance functions, fish species assemblages differed mostly between Salt Creek and the drains (i.e., the three drains had relatively similar species assemblages). Pupfish numbers and environmental variables varied among sites and sample periods. Canonical correlation showed that pupfish abundance was positively correlated with abundance of western mosquitofish, Gambusia affinis, and negatively correlated with abundance of porthole livebearers, Poeciliopsis gracilis, tilapias (Sarotherodon mossambica and Tilapia zillii), longjaw mudsuckers, Gillichthys mirabilis, and mollies (Poecilia latipinnaandPoecilia mexicana). In addition, pupfish abundance was positively correlated with cover, pH, and salinity, and negatively correlated with sediment factor (a measure of sediment grain size) and dissolved oxygen. Pupfish abundance was generally highest in habitats where water quality extremes (especially high pH and salinity, and low dissolved oxygen) seemingly limited the occurrence of nonnative fishes. This study also documented evidence of predation by mudsuckers on pupfish. These findings support the contention of many resource managers that pupfish populations are adversely influenced by ecological interactions with nonnative fishes. ?? Springer 2005.

  15. Variability in prefrontal hemodynamic response during exposure to repeated self-selected music excerpts, a near-infrared spectroscopy study.

    Directory of Open Access Journals (Sweden)

    Saba Moghimi

    Full Text Available Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.

  16. Managing anthelmintic resistance-Variability in the dose of drug reaching the target worms influences selection for resistance?

    Science.gov (United States)

    Leathwick, Dave M; Luo, Dongwen

    2017-08-30

    The concentration profile of anthelmintic reaching the target worms in the host can vary between animals even when administered doses are tailored to individual liveweight at the manufacturer's recommended rate. Factors contributing to variation in drug concentration include weather, breed of animal, formulation and the route by which drugs are administered. The implications of this variability for the development of anthelmintic resistance was investigated using Monte-Carlo simulation. A model framework was established where 100 animals each received a single drug treatment. The 'dose' of drug allocated to each animal (i.e. the concentration-time profile of drug reaching the target worms) was sampled at random from a distribution of doses with mean m and standard deviation s. For each animal the dose of drug was used in conjunction with pre-determined dose-response relationships, representing single and poly-genetic inheritance, to calculate efficacy against susceptible and resistant genotypes. These data were then used to calculate the overall change in resistance gene frequency for the worm population as a result of the treatment. Values for m and s were varied to reflect differences in both mean dose and the variability in dose, and for each combination of these 100,000 simulations were run. The resistance gene frequency in the population after treatment increased as m decreased and as s increased. This occurred for both single and poly-gene models and for different levels of dominance (survival under treatment) of the heterozygote genotype(s). The results indicate that factors which result in lower and/or more variable concentrations of active reaching the target worms are more likely to select for resistance. The potential of different routes of anthelmintic administration to play a role in the development of anthelmintic resistance is discussed. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Use of fuzzy chromatography mass spectrometric (FCMS) fingerprinting and chemometric analysis for differentiation of whole-grain and refined wheat (T. aestivum) flour.

    Science.gov (United States)

    Geng, Ping; Zhang, Mengliang; Harnly, James M; Luthria, Devanand L; Chen, Pei

    2015-10-01

    A fuzzy chromatography mass spectrometric (FCMS) fingerprinting method combined with chemometric analysis has been established for rapid discrimination of whole-grain flour (WF) from refined wheat flour (RF). Bran, germ, endosperm, and WF from three local cultivars or purchased from a grocery store were studied. The state of refinement (whole vs. refined) of wheat flour was differentiated successfully by use of principal-components analysis (PCA) and soft independent modeling of class analogy (SIMCA), despite potential confounding introduced by wheat class (red vs. white; hard vs. soft) or resources (different brands). Twelve discriminatory variables were putatively identified. Among these, dihexoside, trihexoside, apigenin glycosides, and citric acid had the highest peak intensity for germ. Variable line plots indicated phospholipids were more abundant in endosperm. Samples of RF and WF from three cultivars (Hard Red, Hard White, and Soft White) were physically mixed to furnish 20, 40, 60, and 80 % WF of each cultivar. SIMCA was able to discriminate between 100 %, 80 %, 60 %, 40 %, and 20 % WF and 100 % RF. Partial least-squares (PLS) regression was used for prediction of RF-to-WF ratios in the mixed samples. When PLS models were used the relative prediction errors for RF-to-WF ratios were less than 6 %. Graphical Abstract Workflow of targeting discriminatory compounds by use of FCMS and chemometric analysis.

  18. Response surface methodology based on central composite design as a chemometric tool for optimization of dispersive-solidification liquid-liquid microextraction for speciation of inorganic arsenic in environmental water samples.

    Science.gov (United States)

    Asadollahzadeh, Mehdi; Tavakoli, Hamed; Torab-Mostaedi, Meisam; Hosseini, Ghaffar; Hemmati, Alireza

    2014-06-01

    Dispersive-solidification liquid-liquid microextraction (DSLLME) coupled with electrothermal atomic absorption spectrometry (ETAAS) was developed for preconcentration and determination of inorganic arsenic (III, V) in water samples. At pH=1, As(III) formed complex with ammonium pyrrolidine dithiocarbamate (APDC) and extracted into the fine droplets of 1-dodecanol (extraction solvent) which were dispersed with ethanol (disperser solvent) into the water sample solution. After extraction, the organic phase was separated by centrifugation, and was solidified by transferring into an ice bath. The solidified solvent was transferred to a conical vial and melted quickly at room temperature. As(III) was determined in the melted organic phase while As(V) remained in the aqueous layer. Total inorganic As was determined after the reduction of the pentavalent forms of arsenic with sodium thiosulphate and potassium iodide. As(V) was calculated by difference between the concentration of total inorganic As and As(III). The variable of interest in the DSLLME method, such as the volume of extraction solvent and disperser solvent, pH, concentration of APDC (chelating agent), extraction time and salt effect, was optimized with the aid of chemometric approaches. First, in screening experiments, fractional factorial design (FFD) was used for selecting the variables which significantly affected the extraction procedure. Afterwards, the significant variables were optimized using response surface methodology (RSM) based on central composite design (CCD). In the optimum conditions, the proposed method has been successfully applied to the determination of inorganic arsenic in different environmental water samples and certified reference material (NIST RSM 1643e).

  19. Rapid detection of Listeria monocytogenes in milk using confocal micro-Raman spectroscopy and chemometric analysis.

    Science.gov (United States)

    Wang, Junping; Xie, Xinfang; Feng, Jinsong; Chen, Jessica C; Du, Xin-jun; Luo, Jiangzhao; Lu, Xiaonan; Wang, Shuo

    2015-07-02

    Listeria monocytogenes is a facultatively anaerobic, Gram-positive, rod-shape foodborne bacterium causing invasive infection, listeriosis, in susceptible populations. Rapid and high-throughput detection of this pathogen in dairy products is critical as milk and other dairy products have been implicated as food vehicles in several outbreaks. Here we evaluated confocal micro-Raman spectroscopy (785 nm laser) coupled with chemometric analysis to distinguish six closely related Listeria species, including L. monocytogenes, in both liquid media and milk. Raman spectra of different Listeria species and other bacteria (i.e., Staphylococcus aureus, Salmonella enterica and Escherichia coli) were collected to create two independent databases for detection in media and milk, respectively. Unsupervised chemometric models including principal component analysis and hierarchical cluster analysis were applied to differentiate L. monocytogenes from Listeria and other bacteria. To further evaluate the performance and reliability of unsupervised chemometric analyses, supervised chemometrics were performed, including two discriminant analyses (DA) and soft independent modeling of class analogies (SIMCA). By analyzing Raman spectra via two DA-based chemometric models, average identification accuracies of 97.78% and 98.33% for L. monocytogenes in media, and 95.28% and 96.11% in milk were obtained, respectively. SIMCA analysis also resulted in satisfied average classification accuracies (over 93% in both media and milk). This Raman spectroscopic-based detection of L. monocytogenes in media and milk can be finished within a few hours and requires no extensive sample preparation. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Chemometrics-assisted solid-state characterization of pharmaceutically relevant materials. Polymorphic substances.

    Science.gov (United States)

    Calvo, Natalia L; Maggio, Rubén M; Kaufman, Teodoro S

    2017-06-13

    Current regulations command to properly characterize pharmaceutically relevant solid systems. Chemometrics comprise a range of valuable tools, suitable to process large amounts of data and extract valuable information hidden in their structure. This review aims to detail the results of the fruitful association between analytical techniques and chemometrics methods, focusing on those which help to gain insight into the characteristics of drug polymorphism as an important aspect of the solid state of bulk drugs and drug products. Hence, the combination of Raman, terahertz, mid- and near- infrared spectroscopies, as well as instrumental signals resulting from X-ray powder diffraction, (13)C solid state nuclear magnetic resonance spectroscopy and thermal methods with quali-and quantitative chemometrics methodologies are examined. The main issues reviewed, concerning pharmaceutical drug polymorphism, include the use of chemometrics-based approaches to perform polymorph classification and assignment of polymorphic identity, as well as the determination of given polymorphs in simple mixtures and complex systems. Aspects such as the solvation/desolvation of solids, phase transformation, crystallinity and the recrystallization from the amorphous state are also discussed. A brief perspective of the field for the next future is provided, based on the developments of the last decade and the current state of the art of analytical instrumentation and chemometrics methodologies. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Two-dimensional correlation spectroscopy (2D-COS) variable selection for near-infrared microscopy discrimination of meat and bone meal in compound feed.

    Science.gov (United States)

    Lü, Chengxu; Chen, Longjian; Yang, Zengling; Liu, Xian; Han, Lujia

    2014-01-01

    This article presents a novel method for combining auto-peak and cross-peak information for sensitive variable selection in synchronous two-dimensional correlation spectroscopy (2D-COS). This variable selection method is then applied to the case of near-infrared (NIR) microscopy discrimination of meat and bone meal (MBM). This is of important practical value because MBM is currently banned in ruminate animal compound feed. For the 2D-COS analysis, a set of NIR spectroscopy data of compound feed samples (adulterated with varying concentrations of MBM) was pretreated using standard normal variate and detrending (SNVD) and then mapped to the 2D-COS synchronous matrix. For the auto-peak analysis, 12 main sensitive variables were identified at 6852, 6388, 6320, 5788, 5600, 5244, 4900, 4768, 4572, 4336, 4256, and 4192 cm(-1). All these variables were assigned their specific spectral structure and chemical component. For the cross-peak analysis, these variables were divided into two groups, each group containing the six sensitive variables. This grouping resulted in a correlation between the spectral variables that was in accordance with the chemical-component content of the MBM and compound feed. These sensitive variables were then used to build a NIR microscopy discrimination model, which yielded a 97% correct classification. Moreover, this method detected the presence of MBM when its concentration was less than 1% in an adulterated compound feed sample. The concentration-dependent 2D-COS-based variable selection method developed in this study has the unique advantages of (1) introducing an interpretive aspect into variable selection, (2) substantially reducing the complexity of the computations, (3) enabling the transferability of the results to discriminant analysis, and (4) enabling the efficient compression of spectral data.

  2. Near infrared spectroscopy combined with chemometrics for growth stage classification of cannabis cultivated in a greenhouse from seized seeds

    Science.gov (United States)

    Borille, Bruna Tassi; Marcelo, Marcelo Caetano Alexandre; Ortiz, Rafael Scorsatto; Mariotti, Kristiane de Cássia; Ferrão, Marco Flôres; Limberger, Renata Pereira

    2017-02-01

    Cannabis sativa L. (cannabis, Cannabaceae), popularly called marijuana, is one of the oldest plants known to man and it is the illicit drug most used worldwide. It also has been the subject of increasing discussions from the scientific and political points of view due to its medicinal properties. In recent years in Brazil, the form of cannabis drug trafficking has been changing and the Brazilian Federal Police has exponentially increased the number of seizures of cannabis seeds sent by the mail. This new form of trafficking encouraged the study of cannabis seeds seized germinated in a greenhouse through NIR spectroscopy combined with chemometrics. The plants were cultivated in a homemade greenhouse under controlled conditions. In three different growth periods (5.5 weeks, 7.5 weeks and 10 weeks), they were harvested, dried, ground and directly analyzed. The iPCA was used to select the best NIR spectral range (4000-4375 cm- 1) in order to develop unsupervised and supervised methods. The PCA and HCA showed a good separation between the three groups of cannabis samples at different growth stages. The PLS-DA and SVM-DA classified the samples with good results in terms of sensitivity and specificity. The sensitivity and specificity for SVM-DA classification were equal to unity. This separation may be due to the correlation of cannabinoids and volatile compounds concentration during the growth of the cannabis plant. Therefore, the growth stage of cannabis can be predicted by NIR spectroscopy and chemometric tools in the early stages of indoor cannabis cultivation.

  3. Discrimination of sugarcane according to cultivar by 1H NMR and chemometric analyses

    Energy Technology Data Exchange (ETDEWEB)

    Alves Filho, Elenilson G.; Silva, Lorena M.A.; Choze, Rafael; Liao, Luciano M. [Laboratorio de Ressonancia Magnetica Nuclear, Instituto de Quimica, Universidade Federal de Goias (UFG), Goiania, GO (Brazil); Honda, Neli K.; Alcantara, Glaucia B. [Departamento de Quimica, Universidade Federal de Mato Grosso do Sul (UFMS), Campo Grande, MS (Brazil)

    2012-07-01

    Several technologies for the development of new sugarcane cultivars have mainly focused on the increase in productivity and greater disease resistance. Sugarcane cultivars are usually identified by the organography of the leaves and stems, the analysis of peroxidase and esterase isoenzyme activities and the total soluble protein as well as soluble solid content. Nuclear magnetic resonance (NMR) associated with chemometric analysis has proven to be a valuable tool for cultivar assessment. Thus, this article describes the potential of chemometric analysis applied to 1H high resolution magic angle spinning (HRMAS) and NMR in solution for the investigation of sugarcane cultivars. For this purpose, leaves from eight different cultivars of sugarcane were investigated by {sup 1}H NMR spectroscopy in combination with chemometric analysis. The approach shows to be a useful tool for the distinction and classification of different sugarcane cultivars as well as to access the differences on its chemical composition. (author)

  4. Chromatography methods and chemometrics for determination of milk fat adulterants

    Science.gov (United States)

    Trbović, D.; Petronijević, R.; Đorđević, V.

    2017-09-01

    Milk and milk-based products are among the leading food categories according to reported cases of food adulteration. Although many authentication problems exist in all areas of the food industry, adequate control methods are required to evaluate the authenticity of milk and milk products in the dairy industry. Moreover, gas chromatography (GC) analysis of triacylglycerols (TAGs) or fatty acid (FA) profiles of milk fat (MF) in combination with multivariate statistical data processing have been used to detect adulterations of milk and dairy products with foreign fats. The adulteration of milk and butter is a major issue for the dairy industry. The major adulterants of MF are vegetable oils (soybean, sunflower, groundnut, coconut, palm and peanut oil) and animal fat (cow tallow and pork lard). Multivariate analysis enables adulterated MF to be distinguished from authentic MF, while taking into account many analytical factors. Various multivariate analysis methods have been proposed to quantitatively detect levels of adulterant non-MFs, with multiple linear regression (MLR) seemingly the most suitable. There is a need for increased use of chemometric data analyses to detect adulterated MF in foods and for their expanded use in routine quality assurance testing.

  5. Chemometrics of differentially expressed proteins from colorectal cancer patients

    Institute of Scientific and Technical Information of China (English)

    Lay-Chin Yeoh; Saravanan Dharmaraj; Boon-Hui Gooi; Manjit Singh; Lay-Harn Gam

    2011-01-01

    AIM: To evaluate the usefulness of differentially expressed proteins from colorectal cancer (CRC) tissues for differentiating cancer and normal tissues. METHODS: A Proteomic approach was used to identify the differentially expressed proteins between CRC and normal tissues. The proteins were extracted using Tris buffer and thiourea lysis buffer (TLB) for extraction of aqueous soluble and membrane-associated proteins, respectively. Chemometrics, namely principal component analysis (PCA) and linear discriminant analysis (LDA), were used to assess the usefulness of these proteins for identifying the cancerous state of tissues. RESULTS: Differentially expressed proteins identified were 37 aqueous soluble proteins in Tris extracts and 24 membrane-associated proteins in TLB extracts. Based on the protein spots intensity on 2D-gel images, PCA by applying an eigenvalue > 1 was successfully used to reduce the number of principal components (PCs) into 12 and seven PCs for Tris and TLB extracts, respectively, and subsequently six PCs, respectively from both the extracts were used for LDA. The LDA classification for Tris extract showed 82.7% of original samples were correctly classified, whereas 82.7% were correctly classified for the cross-validated samples. The LDA for TLB extract showed that 78.8% of original samples and 71.2% of the cross-validated samples were correctly classified. CONCLUSION: The classification of CRC tissues by PCA and LDA provided a promising distinction between normal and cancer types. These methods can possibly be used for identification of potential biomarkers among the differentially expressed proteins identified.

  6. Design of natural food antioxidant ingredients through a chemometric approach.

    Science.gov (United States)

    Mendiola, Jose A; Martín-Alvarez, Pedro J; Señoráns, F Javier; Reglero, Guillermo; Capodicasa, Alessandro; Nazzaro, Filomena; Sada, Alfonso; Cifuentes, Alejandro; Ibáñez, Elena

    2010-01-27

    In the present work, an environmentally friendly extraction process using subcritical conditions has been tested to obtain potential natural food ingredients from natural sources such as plants, fruits, spirulina, propolis, and tuber, with the scope of substituting synthetic antioxidants, which are subject to regulation restrictions and might be harmful for human health. A full characterization has been undertaken from the chemical and biochemical point of view to be able to understand their mechanism of action. Thus, an analytical method for profiling the compounds responsible for the antioxidant activity has been used, allowing the simultaneous determination of water-soluble vitamins, fat-soluble vitamins, phenolic compounds, carotenoids, and chlorophylls in a single run. This information has been integrated and analyzed using a chemometrical approach to correlate the bioactive compounds profile with the antioxidant activity and thus to be able to predict antioxidant activities of complex formulations. As a further step, a simplex centroid mixture design has been tested to find the optimal formulation and to calculate the effect of the interaction among individual extracts in the mixture.

  7. Application of Chemometrics in the Determination of Spirits Authenticity

    Directory of Open Access Journals (Sweden)

    Estrella Patricia Zayas Ruiz

    2013-02-01

    Full Text Available The authenticity of food and food ingredients is a major problem today for the industry and manytechnologies have been applied to detect adulteration and contamination of food. This paper presentsresults of a study conducted at University College Dublin, Ireland, about vodka and whiskey andmixtures thereof, and another study in Cuba, at the Faculty of Chemical Engineering of the HigherPolytechnic Institute «Jose Antonio Echeverria», in collaboration with the Cuban Research Institute ofSugarcane Derivatives (ICIDCA, with historical data of Cuban rums. In the first study three techniqueswere used to determine whether pure drinks could be separated from mixtures: mid-infrared spectroscopywith a Fourier transform and Attenuated Total Reflectance cell, near infrared spectroscopy and ultravioletvisiblespectroscopy. In the second historical determinations of acidity, acetaldehyde, ethyl acetate,methanol, isoamyl alcohol, isobutanol, propanol and ethanol content of different types of Cuban agedrums to establish the possibility to differentiate the aged rum Vigía from the rest of Cuban aged onesby means of that analytic information. Unscramble software was used to apply Chemometrics. PrincipalComponent Analysis and various pretreatments were applied to data acquired experimentally toreduce the dispersion thereof. Near spectroscopy, Ultraviolet visible and Mid-infrared spectroscopyhave potential for the separation of pure whiskey and vodka from mixtures thereof The Cuban agedrums differ from Vigía aged rum successfully with the use of Principal Component Analysis applied tochemical data.

  8. Chemometric approach for prediction of uranium pathways in the soil

    Energy Technology Data Exchange (ETDEWEB)

    Stojanovic, Mirjana; Nihajlovic, Marija; Petrovic, Jelena; Petrovic, Marija; Sostaric, Tanja; Milojkovic, Jelena [Inst. for Technology of Nuclear and Other Mineral Raw Materials, Belgrad (Serbia); Pezo, Lato [Univ. Belgrad (Serbia). Inst. of General and Physical Chemistry

    2014-10-01

    Understanding the effect of soil parameters (pH, Eh and organic and inorganic ligands availability) on uranium mobility under different geochemical conditions is fundamental for reliable prediction of its behaviour and fate in the environment. In this study, the impact of total and available phosphorus content, humus and acidity of Serbian agricultural soils on the content of total and available uranium were evaluated by Response Surface Methodology (RSM), second order polynomial regression models (SOPs) and artificial neural networks (ANNs). The performance of ANNs was compared with the performance of SOPs and experimental results. SOPs showed high coefficients of determination (0.785-0.956), while ANN model performed high prediction accuracy: 0.8893-0.904. According to the results, total and available uranium content in the soil were mostly affected by pH, statistically significant at p < 0.05 level. For the same responses the total phosphorus was found to be also very influential, statistically significant at p < 0.05 and p < 0.10 levels. The impact of available phosphorus and humus was much more influential on total and available uranium content, compared to total phosphorus content. Proposed chemometric approach will be very helpful in preserving the natural resources and practical application for risk assessment modeling of uranium environmental pathways.

  9. Chemometric evaluation of trace elements in Brazilian medicinal plants

    Energy Technology Data Exchange (ETDEWEB)

    Silva, Paulo S.C. da; Francisconi, Lucilaine S.; Goncalves, Rodolfo D.M.R., E-mail: pscsilva@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil). Centro do Reator de Pesquisas

    2013-07-01

    The growing interest in herbal medicines has required standardization in order to ensure their safe use, therapeutic efficacy and quality of the products. Despite the vast flora and the extensive use of medicinal plants by the Brazilian population, scientific studies on the subject are still insufficiency In this study, 59 medicinal plans were analyzed for the determination of As, Ba, Br, Ca, Cl, Cs, Co, Cr, Fe, Hf, K, Mg, Mn, Na, Rb, Sb, Sc, Se, Ta, Th, U, Zn and Zr by neutron activation analysis and Cu, Ni, Pb, Cd and Hg by atomic absorption. The results were analyzed by chemometric methods: correlation analysis, principal component analysis and cluster analysis, in order to verify whether or not there is similarity with respect to their mineral and trace metal contents. Results obtained permitted to classify distinct groups among the analyzed plants and extracts so that these data can be useful in future studies, concerning the therapeutic action the elements here determined may exert. (author)

  10. Inter-annual variability and potential for selectivity in the diets of deep-water Antarctic echinoderms

    Science.gov (United States)

    Wigham, B. D.; Galley, E. A.; Smith, C. R.; Tyler, P. A.

    2008-11-01

    The continental shelf of the West Antarctic Peninsula (WAP) is a highly productive region but also unusually deep as a result of isostatic depression by the polar ice cap. The close coupling of surface processes with those of the benthos would be expected in such a seasonally variable environment; however, the cold, deep conditions of the WAP shelf may allow for the persistence of organic material in the sediments as a "food bank". Chlorophyll and carotenoid pigments were determined from the gut contents of seven species of echinoderm and from the surficial sediment on the bathyal continental shelf. Samples were collected as part of the FOODBANCS programme during successive cruises in austral spring (October 2000) and austral autumn (March 2001). Pigments were identified and quantified using reverse-phase high-performance liquid chromatography (HPLC). A lack of qualitative selectivity was observed among species, compared to that observed for deep-water assemblages at temperate latitudes, supporting the theory of a persistent "food bank". However, significant quantitative differences were observed among species and between years and sampling location on the shelf. Species differences were marked between those we classified as "true" deposit feeders and those species whose diet also may be supplemented by scavenging and/or grazing.

  11. Selecting variables in non-parametric regression models for binary response. An application to the computerized detection of breast cancer.

    Science.gov (United States)

    Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G; Lado, María J

    2009-01-30

    In many biomedical applications, interest lies in being able to distinguish between two possible states of a given response variable, depending on the values of certain continuous predictors. If the number of predictors, p, is high, or if there is redundancy among them, it then becomes important to decide on the selection of the best subset of predictors that will be able to obtain the models with greatest discrimination capacity. With this aim in mind, logistic generalized additive models were considered and receiver operating characteristic (ROC) curves were applied in order to determine and compare the discriminatory capacity of such models. This study sought to develop bootstrap-based tests that allow for the following to be ascertained: (a) the optimal number q < or = p of predictors; and (b) the model or models including q predictors, which display the largest AUC (area under the ROC curve). A simulation study was conducted to verify the behaviour of these tests. Finally, the proposed method was applied to a computer-aided diagnostic system dedicated to early detection of breast cancer. Copyright (c) 2008 John Wiley & Sons, Ltd.

  12. Relationship of Speed, Agility, Neuromuscular Power, and Selected Anthropometrical Variables and Performance Results of Male and Female Junior Tennis Players.

    Science.gov (United States)

    Munivrana, Goran; Filipčić, Aleš; Filipčić, Tjaša

    2015-07-01

    The aim of the study was to analyses the relation between the selected speed, agility, and neuromuscular power test items. The sample of subjects consisted of 154 male and 152 female young tennis players. Using six motor and three anthropometrical tests we investigate differences between males and females and between two age categories. Finally, we analyzed the relation between motor and anthropometrical tests and a player's tennis performance. The correlation between the two agility test items and 5-m sprint is very large in male players, while only moderate with 20-m sprint in female category. Male tennis players have higher correlations between speed test items and neuromuscular test items. The speed test item (5-m sprint) has large correlation with a player's tennis performance. One-way analysis of variance results indicated that young male tennis players performed significantly better than females in all motor test items. Significant differences between genders have not been revealed only in the body mass index. Differences between the males aged 18& under and 16& under have been noted as significant in all test items, except the vertical jump, while differences between the females have been noted as significant in three anthropometrical tests, quarter jump, and the fan-drill test. Regression analyses have shown that the system of prediction variables explains a relatively small part of variance (46%--males and 40%--females). In both genders, it has been revealed that test items measuring speed significantly influence a player's tennis performance.

  13. Efficient affinity maturation of antibody variable domains requires co-selection of compensatory mutations to maintain thermodynamic stability

    Science.gov (United States)

    Julian, Mark C.; Li, Lijuan; Garde, Shekhar; Wilen, Rebecca; Tessier, Peter M.

    2017-01-01

    The ability of antibodies to accumulate affinity-enhancing mutations in their complementarity-determining regions (CDRs) without compromising thermodynamic stability is critical to their natural function. However, it is unclear if affinity mutations in the hypervariable CDRs generally impact antibody stability and to what extent additional compensatory mutations are required to maintain stability during affinity maturation. Here we have experimentally and computationally evaluated the functional contributions of mutations acquired by a human variable (VH) domain that was evolved using strong selections for enhanced stability and affinity for the Alzheimer’s Aβ42 peptide. Interestingly, half of the key affinity mutations in the CDRs were destabilizing. Moreover, the destabilizing effects of these mutations were compensated for by a subset of the affinity mutations that were also stabilizing. Our findings demonstrate that the accumulation of both affinity and stability mutations is necessary to maintain thermodynamic stability during extensive mutagenesis and affinity maturation in vitro, which is similar to findings for natural antibodies that are subjected to somatic hypermutation in vivo. These findings for diverse antibodies and antibody fragments specific for unrelated antigens suggest that the formation of the antigen-binding site is generally a destabilizing process and that co-enrichment for compensatory mutations is critical for maintaining thermodynamic stability. PMID:28349921

  14. Genetic variability in residual feed intake in rainbow trout clones and testing of indirect selection criteria (Open Access publication

    Directory of Open Access Journals (Sweden)

    Chatain Béatrice

    2008-11-01

    Full Text Available Abstract Little is known about the genetic basis of residual feed intake (RFI variation in fish, since this trait is highly sensitive to environmental influences, and feed intake of individuals is difficult to measure accurately. The purpose of this work was (i to assess the genetic variability of RFI estimated by an X-ray technique and (ii to develop predictive criteria for RFI. Two predictive criteria were tested: loss of body weight during feed deprivation and compensatory growth during re-feeding. Ten heterozygous rainbow trout clones were used. Individual intake and body weight were measured three times at threeweek intervals. Then, individual body weight was recorded after two cycles of a three-week feed deprivation followed by a three-week re-feeding. The ratio of the genetic variance to the phenotypic variance was found high to moderate for growth, feed intake, and RFI (VG/VP = 0.63 ± 0.11, 0.29 ± 0.11, 0.29 ± 0.09, respectively. The index that integrates performances achieved during deprivation and re-feeding periods explained 59% of RFI variations. These results provide a basis for further studies on the origin of RFI differences and show that indirect criteria are good candidates for future selective breeding programs.

  15. Spatiotemporal Variability of Surface Meteorological Variables During Fog and No-Fog Events in the Heber Valley, UT; Selected Case Studies From MATERHORN-Fog

    Science.gov (United States)

    van den Bossche, Michael; De Wekker, Stephan F. J.

    2016-09-01

    We investigated the spatiotemporal variability of surface meteorological variables in the nocturnal boundary layer using six automatic weather stations deployed in the Heber Valley, UT, during the MATERHORN-Fog experiment. The stations were installed on the valley floor within a 1.5 km × 0.8 km area and collected 1-Hz wind and pressure data and 0.2-Hz temperature and humidity data. We describe the weather stations and analyze the spatiotemporal variability of the measured variables during three nights with radiative cooling. Two nights were characterized by the presence of dense ice fog, one night with a persistent (`heavy') fog, and one with a short-lived (`moderate') fog, while the third night had no fog. Frost-point depressions were larger preceding the night without fog and showed a continued decrease during the no-fog night. On both fog nights, the frost-point depression reached values close to zero early in the night, but ~5 h earlier on the heavy-fog night than on the moderate-fog night. Spatial variability of temperature and humidity was smallest during the heavy-fog night and increased temporarily during short periods when wind speeds increased and the fog lifted. During all three nights, wind speeds did not exceed 2 m/s. The temporal variability of the wind speed and direction was larger during the fog nights than during the no-fog nights, but was particularly large during the heavy-fog night. The large variability corresponded with short-lived (5-10 min) pressure variations with amplitudes on the order of 0.5 hPa, indicating gravity wave activity. These pressure fluctuations occurred at all stations and were correlated in particular with variability in wind direction. Although not able to provide a complete picture of the nocturnal boundary layer, our low-cost weather stations were able to continuously collect data that were comparable to those of nearby research-grade instruments. From these data, we distinguished between fog and no-fog events

  16. Detecting correlation between allele frequencies and environmental variables as a signature of selection. A fast computational approach for genome-wide studies

    DEFF Research Database (Denmark)

    Guillot, Gilles; Vitalis, Renaud; Rouzic, Arnaud le;

    2014-01-01

    Genomic regions (or loci) displaying outstanding correlation with some environmental variables are likely to be under selection and this is the rationale of recent methods of identifying selected loci and retrieving functional information about them. To be efficient, such methods need to be able...... to disentangle the potential effect of environmental variables from the confounding effect of population history. For the routine analysis of genome-wide datasets, one also needs fast inference and model selection algorithms. We propose a method based on an explicit spatial model which is an instance of spatial...... generalized linear mixed model (SGLMM). For inference, we make use of the INLA–SPDE theoretical and computational framework developed by Rue et al. (2009) and Lindgren et al. (2011). The method we propose allows one to quantify the correlation between genotypes and environmental variables. It works...

  17. Authenticity study of Phyllanthus species by NMR and FT-IR techniques coupled with chemometric methods

    Energy Technology Data Exchange (ETDEWEB)

    Santos, Maiara S.; Pereira-Filho, Edenir R.; Ferreira, Antonio G. [Universidade Federal de Sao Carlos (UFSCAR), SP (Brazil). Dept. de Quimica; Boffo, Elisangela F. [Universidade Federal da Bahia (UFBA), Salvador, BA (Brazil). Inst. de Quimica; Figueira, Glyn M., E-mail: maiarassantos@yahoo.com.br [Universidade Estadual de Campinas (UNICAMP), Campinas, SP (Brazil). Centro Pluridisciplinar de Pesquisas Quimicas, Biologicas e Agricolas

    2012-07-01

    The importance of medicinal plants and their use in industrial applications is increasing worldwide, especially in Brazil. Phyllanthus species, popularly known as 'quebra-pedras' in Brazil, are used in folk medicine for treating urinary infections and renal calculus. This paper reports an authenticity study, based on herbal drugs from Phyllanthus species, involving commercial and authentic samples using spectroscopic techniques: FT-IR, {sup 1}H HR-MAS NMR and {sup 1}H NMR in solution, combined with chemometric analysis. The spectroscopic techniques evaluated, coupled with chemometric methods, have great potential in the investigation of complex matrices. Furthermore, several metabolites were identified by the NMR techniques. (author)

  18. Authenticity study of Phyllanthus species by NMR and FT-IR Techniques coupled with chemometric methods

    Directory of Open Access Journals (Sweden)

    Maiara S. Santos

    2012-01-01

    Full Text Available The importance of medicinal plants and their use in industrial applications is increasing worldwide, especially in Brazil. Phyllanthus species, popularly known as "quebra-pedras" in Brazil, are used in folk medicine for treating urinary infections and renal calculus. This paper reports an authenticity study, based on herbal drugs from Phyllanthus species, involving commercial and authentic samples using spectroscopic techniques: FT-IR, ¹H HR-MAS NMR and ¹H NMR in solution, combined with chemometric analysis. The spectroscopic techniques evaluated, coupled with chemometric methods, have great potential in the investigation of complex matrices. Furthermore, several metabolites were identified by the NMR techniques.

  19. Grape juice quality control by means of ¹H nmr spectroscopy and chemometric analyses

    Directory of Open Access Journals (Sweden)

    Caroline Werner Pereira da Silva Grandizoli

    2014-01-01

    Full Text Available This work shows the application of ¹H NMR spectroscopy and chemometrics for quality control of grape juice. A wide range of quality assurance parameters were assessed by single ¹H NMR experiments acquired directly from juice. The investigation revealed that conditions and time of storage should be revised and indicated on all labels. The sterilization process of homemade grape juices was efficient, making it possible to store them for long periods without additives. Furthermore, chemometric analysis classified the best commercial grape juices to be similar to homemade grape juices, indicating that this approach can be used to determine the authenticity after adulteration.

  20. Attempt to separate the fluorescence spectra of adrenaline and noradrenaline using chemometrics

    DEFF Research Database (Denmark)

    Nikolajsen, Rikke P; Hansen, Åse Marie; Bro, R

    2000-01-01

    An investigation was conducted on whether the fluorescence spectra of the very similar catecholamines adrenaline and noradrenaline could be separated using chemometric methods. The fluorescence landscapes (several excitation and emission spectra were measured) of two data sets with respectively 16...... regression (Unfold-PLSR) on the larger data set and parallel factor analysis (PARAFAC) of the six samples of the smaller set showed that there was no difference between the fluorescence landscapes of adrenaline and noradrenaline. It can be concluded that chemometric separation of adrenaline and noradrenaline...

  1. Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

    Science.gov (United States)

    Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

    2016-08-01

    The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data

  2. Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents.

    Directory of Open Access Journals (Sweden)

    Pi Guo

    Full Text Available In epidemiological studies, it is important to identify independent associations between collective exposures and a health outcome. The current stepwise selection technique ignores stochastic errors and suffers from a lack of stability. The alternative LASSO-penalized regression model can be applied to detect significant predictors from a pool of candidate variables. However, this technique is prone to false positives and tends to create excessive biases. It remains challenging to develop robust variable selection methods and enhance predictability.Two improved algorithms denoted the two-stage hybrid and bootstrap ranking procedures, both using a LASSO-type penalty, were developed for epidemiological association analysis. The performance of the proposed procedures and other methods including conventional LASSO, Bolasso, stepwise and stability selection models were evaluated using intensive simulation. In addition, methods were compared by using an empirical analysis based on large-scale survey data of hepatitis B infection-relevant factors among Guangdong residents.The proposed procedures produced comparable or less biased selection results when compared to conventional variable selection models. In total, the two newly proposed procedures were stable with respect to various scenarios of simulation, demonstrating a higher power and a lower false positive rate during variable selection than the compared methods. In empirical analysis, the proposed procedures yielding a sparse set of hepatitis B infection-relevant factors gave the best predictive performance and showed that the procedures were able to select a more stringent set of factors. The individual history of hepatitis B vaccination, family and individual history of hepatitis B infection were associated with hepatitis B infection in the studied residents according to the proposed procedures.The newly proposed procedures improve the identification of significant variables and enable us to

  3. Impact of strong selection for the PrP major gene on genetic variability of four French sheep breeds (Open Access publication

    Directory of Open Access Journals (Sweden)

    Pantano Thais

    2008-11-01

    Full Text Available Abstract Effective selection on the PrP gene has been implemented since October 2001 in all French sheep breeds. After four years, the ARR "resistant" allele frequency increased by about 35% in young males. The aim of this study was to evaluate the impact of this strong selection on genetic variability. It is focussed on four French sheep breeds and based on the comparison of two groups of 94 animals within each breed: the first group of animals was born before the selection began, and the second, 3–4 years later. Genetic variability was assessed using genealogical and molecular data (29 microsatellite markers. The expected loss of genetic variability on the PrP gene was confirmed. Moreover, among the five markers located in the PrP region, only the three closest ones were affected. The evolution of the number of alleles, heterozygote deficiency within population, expected heterozygosity and the Reynolds distances agreed with the criteria from pedigree and pointed out that neutral genetic variability was not much affected. This trend depended on breed, i.e. on their initial states (population size, PrP frequencies and on the selection strategies for improving scrapie resistance while carrying out selection for production traits.

  4. Spatial distribution of heavy metals in Hong Kong's marine sediments and their human impacts: A GIS-based chemometric approach

    Energy Technology Data Exchange (ETDEWEB)

    Zhou Feng [College of Environmental Sciences, Peking University, Beijing 100871 (China)], E-mail: jardon.zhou@gmail.com; Guo Huaicheng [College of Environmental Sciences, Peking University, Beijing 100871 (China)], E-mail: hcguo@pku.edu.cn; Hao Zejia [College of Environmental Sciences, Peking University, Beijing 100871 (China)

    2007-09-15

    A geographic information system (GIS)-based chemometric approach was applied to investigate the spatial distribution patterns of heavy metals in marine sediments and to identify spatial human impacts on global and local scales. Twelve metals (Zn, V, Ni, Mn, Pb, Cu, Cd, Ba, Hg, Fe, Cr and Al) were surveyed twice annually at 59 sites in Hong Kong from 1998 to 2004. Cluster analysis classified the entire coastal area into three areas on a global scale, representing different pollution levels. Backward discriminant analysis, with 84.5% correct assignments, identified Zn, Pb, Cu, Cd, V, and Fe as significant variables affecting spatial variation on a local scale. Enrichment factors indicated that Cu, Cr, and Zn were derived from human impacts while Al, Ba, Mn, V and Fe originated from rock weathering. Principal component analysis further subdivided human impacts and their affected areas in each area, explaining 87%, 84% and 87% of the total variances, respectively. The primary anthropogenic sources in the three areas were (i) anti-fouling paint and domestic sewage; (ii) surface runoff, wastewater, vehicle emissions and marine transportation; and (iii) ship repainting, dental clinics, electronic/chemical industries and leaded fuel, respectively. Moreover, GIS-based spatial analysis facilitated chemometric methods.

  5. The Influence Of Variability Of Water Resources In Lowland Forests On Selected Parameters Describing The Condition Of Trees

    Science.gov (United States)

    Tyszka, Jan; Stolarek, Andrzej; Fronczak, Ewa

    2014-01-01

    The influence of water conditions on the condition and growth of tree stands has been analysed in the context of the climatic and hydrological functions forest plays. Long observational series obtained for precipitation, outflow and depths below the surface of the water table have been put together with measured increases in the breast-height diameters of Scots pines and the severity of crown defoliation observable in selected tree species growing on the Polish Lowland, in order to determine the overall scope to the reaction stand condition manifests in the face of ongoing variability of water conditions within forest. An overall improvement in the condition of stands over the last 20 years does not disguise several-year cyclicity to changes capable of shaping the situation, i.a. departures from long-term mean values for precipitation totals and groundwater levels. The condition of stands is seen to worsen in both dry and wet years. Analysis of the degree to which pine, spruce and broadleaved stands experience defoliation points to spruce stands responding most to extreme hydro-climatic conditions. Extreme situations as regards water resources were seen to involve a response over two-year time intervals in the case of coniferous stands. Unsurprisingly, optimal growing-season (June-September) precipitation totals correspond with long-term average figures, while being slightly higher for spruce (at 384 mm), than for Scots pine or broadleaved species (375 mm). The relationships reported gain confirmation in analysis of periodic change in breast-height diameter increments characterising Scots pines, whose growth is seen to depend closely, not only on precipitation, but also above all on the depth of the water table in the summer half-year. Optimal depths of the water table proved to be different, being around 20 cm below ground in the case of marshy coniferous forest, 80 cm in wet habitats, and 135 cm in fresh habitats. Depending on the possibilities for water to soak

  6. Chemometrics quality assessment of wastewater treatment plant effluents using physicochemical parameters and UV absorption measurements.

    Science.gov (United States)

    Platikanov, S; Rodriguez-Mozaz, S; Huerta, B; Barceló, D; Cros, J; Batle, M; Poch, G; Tauler, R

    2014-07-01

    Chemometric techniques like Principal Component Analysis (PCA) and Partial Least Squares Regression (PLS) are used to explore, analyze and model relationships among different water quality parameters in wastewater treatment plants (WWTP). Different data sets generated by laboratory analysis and by an automatic multi-parametric monitoring system with a new designed optical device have been investigated for temporal variations on water quality parameters measured in the water influent and effluent of a WWTP over different time scales. The obtained results allowed the discovery of the more important relationships among the monitored parameters and of their cyclic dependence on time (daily, monthly and annual cycles) and on different plant management procedures. This study intended also the modeling and prediction of concentrations of several water components and parameters, especially relevant for water quality assessment, such as Dissolved Organic Matter (DOM), Total Organic Carbon (TOC) nitrate, detergent, and phenol concentrations. PLS models were built to correlate target concentrations of these constituents with UV spectra measured in samples collected at (1) laboratory conditions (in synthetic water mixtures); and at (2) WWTP conditions (in real water samples from the plant). Using synthetic water mixtures, specific wavelengths were selected with the aim to establish simple and reliable prediction models, which gave good relative predictions with errors of around 3-4% for nitrates, detergent and phenols concentrations and of around 15% for the DOM in external validation. In the case of nitrate and TOC concentrations modeling in real water samples from the effluent of the WWTP using the reduced spectral data set, results were also promising with low prediction errors (less than 20%).

  7. Chemometric Evaluation of Urinary Steroid Hormone Levels as Potential Biomarkers of Neuroendocrine Tumors

    Directory of Open Access Journals (Sweden)

    Barbara Seroczyńska

    2013-10-01

    Full Text Available Neuroendocrine tumors (NETs are uncommon tumors which can secrete specific hormone products such as peptides, biogenic amines and hormones. So far, the diagnosis of NETs has been difficult because most NET markers are not specific for a given tumor and none of the NET markers can be used to fulfil the criteria of high specificity and high sensitivity for the screening procedure. However, by combining the measurements of different NET markers, they become highly sensitive and specific diagnostic tests. The aim of the work was to identify whether urinary steroid hormones can be identified as potential new biomarkers of NETs, which could be used as prognostic and clinical course monitoring factors. Thus, a rapid and sensitive reversed-phase high-performance liquid chromatographic method (RP-HPLC with UV detection has been developed for the determination of cortisol, cortisone, corticosterone, testosterone, epitestosterone and progesterone in human urine. The method has been validated for accuracy, precision, selectivity, linearity, recovery and stability. The limits of detection and quantification were 0.5 and 1 ng mL−1 for each steroid hormone, respectively. Linearity was confirmed within a range of 1–300 ng mL−1 with a correlation coefficient greater than 0.9995 for all analytes. The described method was successfully applied for the quantification of six endogenous steroid levels in human urine. Studies were performed on 20 healthy volunteers and 19 patients with NETs. Next, for better understanding of tumor biology in NETs and for checking whether steroid hormones can be used as potential biomarkers of NETs, a chemometric analysis of urinary steroid hormone levels in both data sets was performed.

  8. Chemometric evaluation of urinary steroid hormone levels as potential biomarkers of neuroendocrine tumors.

    Science.gov (United States)

    Plenis, Alina; Miękus, Natalia; Olędzka, Ilona; Bączek, Tomasz; Lewczuk, Anna; Woźniak, Zofia; Koszałka, Patrycja; Seroczyńska, Barbara; Skokowski, Jarosław

    2013-10-16

    Neuroendocrine tumors (NETs) are uncommon tumors which can secrete specific hormone products such as peptides, biogenic amines and hormones. So far, the diagnosis of NETs has been difficult because most NET markers are not specific for a given tumor and none of the NET markers can be used to fulfil the criteria of high specificity and high sensitivity for the screening procedure. However, by combining the measurements of different NET markers, they become highly sensitive and specific diagnostic tests. The aim of the work was to identify whether urinary steroid hormones can be identified as potential new biomarkers of NETs, which could be used as prognostic and clinical course monitoring factors. Thus, a rapid and sensitive reversed-phase high-performance liquid chromatographic method (RP-HPLC) with UV detection has been developed for the determination of cortisol, cortisone, corticosterone, testosterone, epitestosterone and progesterone in human urine. The method has been validated for accuracy, precision, selectivity, linearity, recovery and stability. The limits of detection and quantification were 0.5 and 1 ng mL-1 for each steroid hormone, respectively. Linearity was confirmed within a range of 1-300 ng mL-1 with a correlation coefficient greater than 0.9995 for all analytes. The described method was successfully applied for the quantification of six endogenous steroid levels in human urine. Studies were performed on 20 healthy volunteers and 19 patients with NETs. Next, for better understanding of tumor biology in NETs and for checking whether steroid hormones can be used as potential biomarkers of NETs, a chemometric analysis of urinary steroid hormone levels in both data sets was performed.

  9. [Discrimination of donkey meat by NIR and chemometrics].

    Science.gov (United States)

    Niu, Xiao-Ying; Shao, Li-Min; Dong, Fang; Zhao, Zhi-Lei; Zhu, Yan

    2014-10-01

    Donkey meat samples (n = 167) from different parts of donkey body (neck, costalia, rump, and tendon), beef (n = 47), pork (n = 51) and mutton (n = 32) samples were used to establish near-infrared reflectance spectroscopy (NIR) classification models in the spectra range of 4,000~12,500 cm(-1). The accuracies of classification models constructed by Mahalanobis distances analysis, soft independent modeling of class analogy (SIMCA) and least squares-support vector machine (LS-SVM), respectively combined with pretreatment of Savitzky-Golay smooth (5, 15 and 25 points) and derivative (first and second), multiplicative scatter correction and standard normal variate, were compared. The optimal models for intact samples were obtained by Mahalanobis distances analysis with the first 11 principal components (PCs) from original spectra as inputs and by LS-SVM with the first 6 PCs as inputs, and correctly classified 100% of calibration set and 98. 96% of prediction set. For minced samples of 7 mm diameter the optimal result was attained by LS-SVM with the first 5 PCs from original spectra as inputs, which gained an accuracy of 100% for calibration and 97.53% for prediction. For minced diameter of 5 mm SIMCA model with the first 8 PCs from original spectra as inputs correctly classified 100% of calibration and prediction. And for minced diameter of 3 mm Mahalanobis distances analysis and SIMCA models both achieved 100% accuracy for calibration and prediction respectively with the first 7 and 9 PCs from original spectra as inputs. And in these models, donkey meat samples were all correctly classified with 100% either in calibration or prediction. The results show that it is feasible that NIR with chemometrics methods is used to discriminate donkey meat from the else meat.

  10. Signature-Discovery Approach for Sample Matching of a Nerve-Agent Precursor using Liquid Chromatography–Mass Spectrometry, XCMS, and Chemometrics

    Energy Technology Data Exchange (ETDEWEB)

    Fraga, Carlos G.; Clowers, Brian H.; Moore, Ronald J.; Zink, Erika M.

    2010-05-15

    This report demonstrates the use of bioinformatic and chemometric tools on liquid chromatography mass spectrometry (LC-MS) data for the discovery of ultra-trace forensic signatures for sample matching of various stocks of the nerve-agent precursor known as methylphosphonic dichloride (dichlor). The use of the bioinformatic tool known as XCMS was used to comprehensively search and find candidate LC-MS peaks in a known set of dichlor samples. These candidate peaks were down selected to a group of 34 impurity peaks. Hierarchal cluster analysis and factor analysis demonstrated the potential of these 34 impurities peaks for matching samples based on their stock source. Only one pair of dichlor stocks was not differentiated from one another. An acceptable chemometric approach for sample matching was determined to be variance scaling and signal averaging of normalized duplicate impurity profiles prior to classification by k-nearest neighbors. Using this approach, a test set of dichlor samples were all correctly matched to their source stock. The sample preparation and LC-MS method permitted the detection of dichlor impurities presumably in the parts-per-trillion (w/w). The detection of a common impurity in all dichlor stocks that were synthesized over a 14-year period and by different manufacturers was an unexpected discovery. Our described signature-discovery approach should be useful in the development of a forensic capability to help in criminal investigations following chemical attacks.

  11. Chemometric study of the sinter mixtures used in sinter plants in Poland

    Directory of Open Access Journals (Sweden)

    A. Smoliński

    2015-01-01

    Full Text Available The main goal of the study was the analysis of chemical parameters of sinter mixtures used in sinter plants in Poland. For this purpose the chemometric method was used, in this case hierarchical clustering analysis. This method allowed to examine the similarities and differences between the studied sinter mixtures.

  12. Provenance of pottery determined by soil physicochemical and chemometric methods: A case study from Frederiksgave, Ghana

    DEFF Research Database (Denmark)

    Rasmussen, Lars Holm; Bredwa-Mensah, Y.; Borggaard, Ole K.;

    2009-01-01

    The suitability of using traditional soil chemical and mineralogical methods combined with chemometrics to trace provenance of archaeological samples was tested on potsherds from Frederiksgave, a former Danish plantation in southern Ghana, in use from 18301850. Soil and six potsherds from Frederi...

  13. Experimental Design, Near-Infrared Spectroscopy, and Multivariate Calibration: An Advanced Project in a Chemometrics Course

    Science.gov (United States)

    de Oliveira, Rodrigo R.; das Neves, Luiz S.; de Lima, Kassio M. G.

    2012-01-01

    A chemometrics course is offered to students in their fifth semester of the chemistry undergraduate program that includes an in-depth project. Students carry out the project over five weeks (three 8-h sessions per week) and conduct it in parallel to other courses or other practical work. The students conduct a literature search, carry out…

  14. [Application of chemometrics in composition-activity relationship research of traditional Chinese medicine].

    Science.gov (United States)

    Han, Sheng-Nan

    2014-07-01

    Chemometrics is a new branch of chemistry which is widely applied to various fields of analytical chemistry. Chemometrics can use theories and methods of mathematics, statistics, computer science and other related disciplines to optimize the chemical measurement process and maximize access to acquire chemical information and other information on material systems by analyzing chemical measurement data. In recent years, traditional Chinese medicine has attracted widespread attention. In the research of traditional Chinese medicine, it has been a key problem that how to interpret the relationship between various chemical components and its efficacy, which seriously restricts the modernization of Chinese medicine. As chemometrics brings the multivariate analysis methods into the chemical research, it has been applied as an effective research tool in the composition-activity relationship research of Chinese medicine. This article reviews the applications of chemometrics methods in the composition-activity relationship research in recent years. The applications of multivariate statistical analysis methods (such as regression analysis, correlation analysis, principal component analysis, etc. ) and artificial neural network (such as back propagation artificial neural network, radical basis function neural network, support vector machine, etc. ) are summarized, including the brief fundamental principles, the research contents and the advantages and disadvantages. Finally, the existing main problems and prospects of its future researches are proposed.

  15. Experimental Design, Near-Infrared Spectroscopy, and Multivariate Calibration: An Advanced Project in a Chemometrics Course

    Science.gov (United States)

    de Oliveira, Rodrigo R.; das Neves, Luiz S.; de Lima, Kassio M. G.

    2012-01-01

    A chemometrics course is offered to students in their fifth semester of the chemistry undergraduate program that includes an in-depth project. Students carry out the project over five weeks (three 8-h sessions per week) and conduct it in parallel to other courses or other practical work. The students conduct a literature search, carry out…

  16. Chemometric source identification of PCDD/Fs and other POPs in sediment cores of North-East Germany

    Energy Technology Data Exchange (ETDEWEB)

    Koch, M. [Ecofys GmbH (Germany); Ricking, M. [Freie Univ. Berlin (Germany); Rotard, W. [Technische Univ. Berlin (Germany)

    2004-09-15

    A broad range of persistent organic pollutants (POPs) and selected heavy metals has been analysed in sediment cores of North-East Germany. The pollutants analysed include polychlorinated dioxins and furans (PCDD/Fs), polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs), several pesticides (DDT, HCH, CBz) and its metabolites as well as selected heavy metals. The sediment cores were sampled at five locations, reflecting a range of anthropogenic influences and background contamination: Arkona Basin (AK) representing a remote marine site, Lake Bugsin (BS) a background location only with atmospheric deposition, Lake Quenz (QS) close to the industrial city of Brandenburg, Teltowkanal (TK) in the suburban-industrial zone of Berlin and Lake White (WS) in the centre of Berlin. The age of the lower part of the AK, BS and TK cores were dated at 100-150 years. Results of selected pollutants (PCDD/Fs) have been presented earlier, focussing on the depth profile and the pollutant patterns. Here, a comprehensive overview of the source identification of all pollutants and the related applications of chemometric methods is presented.

  17. Transit time distributions and StorAge Selection functions in a sloping soil lysimeter with time-varying flow paths: Direct observation of internal and external transport variability

    Science.gov (United States)

    Kim, Minseok; Pangle, Luke A.; Cardoso, Charléne; Lora, Marco; Volkmann, Till H. M.; Wang, Yadi; Harman, Ciaran J.; Troch, Peter A.

    2016-09-01

    Transit times through hydrologic systems vary in time, but the nature of that variability is not well understood. Transit times variability was investigated in a 1 m3 sloping lysimeter, representing a simplified model of a hillslope receiving periodic rainfall events for 28 days. Tracer tests were conducted using an experimental protocol that allows time-variable transit time distributions (TTDs) to be calculated from data. Observed TTDs varied with the storage state of the system, and the history of inflows and outflows. We propose that the observed time variability of the TTDs can be decomposed into two parts: "internal" variability associated with changes in the arrangement of, and partitioning between, flow pathways; and "external" variability driven by fluctuations in the flow rate along all flow pathways. These concepts can be defined quantitatively in terms of rank StorAge Selection (rSAS) functions, which is a theory describing lumped transport dynamics. Internal variability is associated with temporal variability in the rSAS function, while external is not. The rSAS function variability was characterized by an "inverse storage effect," whereby younger water is released in greater proportion under wetter conditions than drier. We hypothesize that this effect is caused by the rapid mobilization of water in the unsaturated zone by the rising water table. Common approximations used to model transport dynamics that neglect internal variability were unable to reproduce the observed breakthrough curves accurately. This suggests that internal variability can play an important role in hydrologic transport dynamics, with implications for field data interpretation and modeling.

  18. Use of chemometric and quantum-mechanical methods in the analysis of bioactive terpenoids and phenylpropanoids against the Aedes aegypti

    Directory of Open Access Journals (Sweden)

    Reginaldo Bezerra dos Santos

    2010-01-01

    Full Text Available Dengue fever is one of the main public health problems in the world. Many mosquitoes have developed resistance to the conventional insecticides used. Thus, the search for vegetable extracts and natural substances as alternative insecticides has increased. In this study, chemometric methods were employed to classify a group of terpenoid and phenylpropanoid compounds with biological activity against the larval of the A. aegypti mosquitoes. The AM1 (Austin Model 1 method was used to calculate a set of molecular descriptors (properties for the studied compounds. Then, the descriptors were analyzed using the following methods of pattern recognition: Principal Component Analysis (PCA and Hierarchical Clustering Analysis (HCA. The PCA and HCA methods have shown to be very effective for the classification of the study compounds in two groups (active and inactive. The electronic variables EHOMO-1, EHOMO-2, ELUMO, ELUMO+2, and the structural LogP were used to classify as active and inactive compounds. In most studied compounds, the variables responsible for separating active from inactive compounds were electronic descriptors. Thus, it can be concluded that electronic effects play a fundamental role in the interaction between biological receptor and terpenoid and phenylpropanoid compounds with activity against larval A. aegypti mosquitoes.

  19. Investigation of Arctic and Antarctic spatial and depth patterns of sea water in CTD profiles using chemometric data analysis

    Science.gov (United States)

    Kotwa, Ewelina; Lacorte, Silvia; Duarte, Carlos; Tauler, Roma

    2014-09-01

    In this paper we examine 2- and 3-way chemometric methods for analysis of Arctic and Antarctic water samples. Standard CTD (conductivity-temperature-depth) sensor devices were used during two oceanographic expeditions (July 2007 in the Arctic; February 2009 in the Antarctic) covering a total of 174 locations. The output from these devices can be arranged in a 3-way data structure (according to sea water depth, measured variables, and geographical location). We used and compared 2- and 3-way statistical tools including PCA, PARAFAC, PLS, and N-PLS for exploratory analysis, spatial patterns discovery and calibration. Particular importance was given to the correlation and possible prediction of fluorescence from other physical variables. MATLAB's mapping toolbox was used for geo-referencing and visualization of the results. We conclude that: 1) PCA and PARAFAC models were able to describe data in a satisfactory way, but PARAFAC results were easier to interpret; 2) applying a 2-way model to 3-way data raises the risk of flattening the covariance structure of the data and losing information; 3) the distinction between Arctic and Antarctic seas was revealed mostly by PC1, relating to the physico-chemical properties of the water samples; and 4) we confirm the ability to predict fluorescence values from physical measurements when the 3-way data structure is used in N-way PLS regression.

  20. Combining directed acyclic graphs and the change-in-estimate procedure as a novel approach to adjustment-variable selection in epidemiology

    Directory of Open Access Journals (Sweden)

    Evans David

    2012-10-01

    Full Text Available Abstract Background Directed acyclic graphs (DAGs are an effective means of presenting expert-knowledge assumptions when selecting adjustment variables in epidemiology, whereas the change-in-estimate procedure is a common statistics-based approach. As DAGs imply specific empirical relationships which can be explored by the change-in-estimate procedure, it should be possible to combine the two approaches. This paper proposes such an approach which aims to produce well-adjusted estimates for a given research question, based on plausible DAGs consistent with the data at hand, combining prior knowledge and standard regression methods. Methods Based on the relationships laid out in a DAG, researchers can predict how a collapsible estimator (e.g. risk ratio or risk difference for an effect of interest should change when adjusted on different variable sets. Implied and observed patterns can then be compared to detect inconsistencies and so guide adjustment-variable selection. Results The proposed approach involves i. drawing up a set of plausible background-knowledge DAGs; ii. starting with one of these DAGs as a working DAG, identifying a minimal variable set, S, sufficient to control for bias on the effect of interest; iii. estimating a collapsible estimator adjusted on S, then adjusted on S plus each variable not in S in turn (“add-one pattern” and then adjusted on the variables in S minus each of these variables in turn (“minus-one pattern”; iv. checking the observed add-one and minus-one patterns against the pattern implied by the working DAG and the other prior DAGs; v. reviewing the DAGs, if needed; and vi. presenting the initial and all final DAGs with estimates. Conclusion This approach to adjustment-variable selection combines background-knowledge and statistics-based approaches using methods already common in epidemiology and communicates assumptions and uncertainties in a standardized graphical format. It is probably best suited to

  1. Discrimination of American ginseng and Asian ginseng using electronic nose and gas chromatography–mass spectrometry coupled with chemometrics

    Directory of Open Access Journals (Sweden)

    Shaoqing Cui

    2017-01-01

    Conclusion: Combined with advanced chemometrics, the E-nose is capable of discriminating between American and Asian ginseng in both qualitative and quantitative angles, presenting an accurate, rapid, and nondestructive reference approach.

  2. Parametric study of a variable-magnetic-field-based energy-selection system for generating a spread-out Bragg peak with a laser-accelerated proton beam

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Dae-Hyun; Suh, Tae-Suk [The Catholic University of Korea, Seoul (Korea, Republic of); Kang, Young-Nam [Seoul St. Mary' s Hospital, The Catholic University of Korea, Seoul (Korea, Republic of); Yoo, Seung-Hoon [CHA Bundang Medical Center, CHA University, Seongnam (Korea, Republic of); Pae, Ki-Hong [Gwangju Institute of Science and Technology, Gwangju (Korea, Republic of); Shin, Dong-Ho; Lee, Se-Byeong [National Cancer Center, Goyang (Korea, Republic of)

    2013-01-15

    Laser-based proton beam acceleration, which produces broad energy spectra, is unsuitable for direct clinical use. Thus, employing an energy selection system is necessary. The purpose of the present study was to investigate a method whereby a variable magnetic field could be employed with an energy selection system to generate a spread-out Bragg peak (SOBP). For energy selection, particle transport and dosimetric property measurements, the Geant4 toolkit was implemented. The energy spectrum of the laser-accelerated proton beam was acquired using a particle-in-cell simulation. The hole size and the position of the energy selection collimator were varied in order to determine the effects of those parameters on the dosimetric properties. To generate an SOBP, we changed the magnetic field in the energy selection system for each beam weighting factor during beam irradiation. The overall results of this study suggest that the use of an energy selection system with a variable magnetic field can effectively generate an SOBP suitable for proton radiation therapy applications.

  3. Differences in pup birth weight, pup variability within litters, and dam weight of mice selected for alternative criteria to increase litter size.

    Science.gov (United States)

    van Engelen, M A; Nielsen, M K; Ribeiro, E L

    1995-07-01

    Selection for litter size had been practiced for 21 generations and relaxed selection for 13 generations in mice. Three replicates were used with four selection criteria: index of components (ovulation rate and ova success), uterine capacity, litter size, and an unselected control. Especially with selection for litter size and the index relative to the control, number of pups born had increased, and differences also occurred in mating weight. Dams of the three replicates and their litters were used to evaluate the effects of accumulated selection on pup birth weight, variability in weight of littermates, and dam's weight at mating and after littering. Total number born, number born alive, number of males, and number of females were also recorded and studied. Mean pup birth weight did not differ among the criteria; however, variability among littermates in pup weight tended to differ among criteria of selection. Regressions for pup weight and within-litter standard deviation of pup weight on number born were small and negative but significant (P litter was normal for 77.2% of the litters, with no differences among the criteria. The difference between weight of male and weight of female pups was significant (P littering weight; however, the maternal weight gain between mating and littering was not different among criteria. Number born differed (P < .003) among the criteria, but there was no significant difference among criteria in numbers of males and females.(ABSTRACT TRUNCATED AT 250 WORDS)

  4. An all fiber apparatus for microparticles selective manipulation based on a variable ratio coupler and a microfiber

    Science.gov (United States)

    Li, Baoli; Luo, Wei; Xu, Fei; Lu, Yanqing

    2016-09-01

    We propose an all fiber apparatus based on a variable ratio coupler which can transport microparticles controllably and trap particles one by one along a microfiber. By connecting two output ports of a variable ratio coupler with two end pigtails of a microfiber and launching a 980 nm laser into the variable ratio coupler, particles in suspension were trapped to the waist of microfiber due to a gradient force and then transported along the microfiber due to a total scattering force generated by two counter-propagating beams. The direction of transportation was controlled by altering the coupling ratio of the variable ratio coupler. When the intensities of two output ports were equivalent, trapped particles stayed at fixed positions. With time going, another particle around the micro fiber was trapped onto the microfiber. There were three particles trapped in total in our experiment. This technique combines with the function of conventional tweezers and optical conveyor.

  5. ESTIMATED AND ANALYSIS OF THE RELATIONSHIP BETWEEN THE ENDOGENOUS AND EXOGENOUS VARIABLES USING FUZZY SEMI-PARAMETRIC SAMPLE SELECTION MODEL

    National Research Council Canada - National Science Library

    L. MuhamadSafiih; A. A. Kamil; M. T. Abu Osman

    2014-01-01

    ... this problem is through the use of semi-parametric method. However, the uncertainties and ambiguities exist in the models, particularly the relationship between the endogenous and exogenous variables...

  6. 纵向数据下线性EV模型的变量选择%Variable Selection for the Linear EV Model with Longitudinal Data

    Institute of Scientific and Technical Information of China (English)

    田瑞琴; 薛留根

    2013-01-01

    本文考虑了纵向数据线性EV模型的变量选择.基于二次推断函数方法和压缩方法的思想提出了一种新的偏差校正的变量选择方法.在选择适当的调整参数下,我们证明了所得到的估计量的相合性和渐近正态性.最后通过模拟研究验证了所提出的变量选择方法的有限样本性质.%In this paper,we focus on the variable selection for the linear EV model with longitudinal data when some covariates are measured with errors.A new bias-corrected variable selection procedure is proposed based on the combination of the quadratic inference functions and shrinkage estimations.With appropriate selection of the tuning parameters,we establish the consistency and asymptotic normality of the resulting estimators.Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.

  7. Chemometric data analysis application to Sparus aurata samples from two offshore farming plants along the Apulian (Italy) coastline.

    Science.gov (United States)

    Miniero, Roberto; Brambilla, Gianfranco; Chiaravalle, Eugenio; Mangiacotti, Michele; Brizzi, Giulio; Ingelido, Anna Maria; Abate, Vittorio; Cascone, Valeria; Ferri, Fabiola; Iacovella, Nicola; di Domenico, Alessandro

    2011-10-01

    The levels of polychlorodibenzo-p-dioxins (PCDDs), polychlorodibenzofurans (PCDFs), dioxin-like polychlorobiphenyls (DL-PCBs), non-dioxin-like polychlorobiphenyls (NDL-PCBs), and polybromodiphenyl ethers (PBDEs) in fish collected from two marine offshore farming plants were determined. Each sample was constituted by specimens of the same size collected at the same time in four different seasons along the farming year. The feeds given were of industrial origin and the plants were positioned in two different sites respectively exposed to different environmental characteristics. A chemometric approach was applied to interpret the subtle differences observed in fish body burdens across the three chemical groups taken into consideration. The approach consisted in a stepwise multivariate process including a hierarchical cluster analysis (CA) and a linear discriminant analysis (DA). The two main clusters determined by CA were subjected to the canonical DA, backward and forward selection procedures to select the best discriminative functions. A clear temporal and spatial discrimination was found among the samples. Across the three chemical groups, the monthly separation seemed to depend on the growth process and the main exposure was due to the feed. In addition, the two plants differed significantly from the environmental point of view and the most important discriminating group of chemicals were the NDL-PCBs. The approach resulted really effective in discriminating the subtle differences and in individuating suggestions to improve the quality of culturing conditions.

  8. Program Directors' Responses to a Survey on Variables Used To Select Residents in a Time of Change.

    Science.gov (United States)

    Wagoner, Norma E.; Suriano, J. Robert

    1999-01-01

    A survey of 794 program directors in 14 specialties assessed actual and projected changes in the selection process for medical residents and determined the relative weights the directors assigned to personal and academic criteria. Results indicate significant changes in the selection process, including a continuing decrease in residency positions…

  9. Data-driven approach to Type Ia supernovae: variable selection on the peak luminosity and clustering in visual analytics

    Science.gov (United States)

    Uemura, Makoto; Kawabata, Koji S.; Ikeda, Shiro; Maeda, Keiichi; Wu, Hsiang-Yun; Watanabe, Kazuho; Takahashi, Shigeo; Fujishiro, Issei

    2016-03-01

    Type Ia supernovae (SNIa) have an almost uniform peak luminosity, so that they are used as “standard candle” to estimate distances to galaxies in cosmology. In this article, we introduce our two recent works on SNIa based on data-driven approach. The diversity in the peak luminosity of SNIa can be reduced by corrections in several variables. The color and decay rate have been used as the explanatory variables of the peak luminosity in past studies. However, it is proposed that their spectral data could give a better model of the peak luminosity. We use cross-validation in order to control the generalization error and a LASSO-type estimator in order to choose the set of variables. Using 78 samples and 276 candidates of variables, we confirm that the peak luminosity depends on the color and decay rate. Our analysis does not support adding any other variables in order to have a better generalization error. On the other hand, this analysis is based on the assumption that SNIa originate in a single population, while it is not trivial. Indeed, several sub-types possibly having different nature have been proposed. We used a visual analytics tool for the asymmetric biclustering method to find both a good set of variables and samples at the same time. Using 14 variables and 132 samples, we found that SNIa can be divided into two categories by the expansion velocity of ejecta. Those examples demonstrate that the data-driven approach is useful for high-dimensional large-volume data which becomes common in modern astronomy.

  10. Combination of Analytical and Chemometric Methods as a Useful Tool for the Characterization of Extra Virgin Argan Oil and Other Edible Virgin Oils. Role of Polyphenols and Tocopherols.

    Science.gov (United States)

    Rueda, Ascensión; Samaniego-Sánchez, Cristina; Olalla, Manuel; Giménez, Rafael; Cabrera-Vique, Carmen; Seiquer, Isabel; Lara, Luis

    2016-01-01

    Analysis of phenolic profile and tocopherol fractions in conjunction with chemometrics techniques were used for the accurate characterization of extra virgin argan oil and eight other edible vegetable virgin oils (olive, soybean, wheat germ, walnut, almond, sesame, avocado, and linseed) and to establish similarities among them. Phenolic profile and tocopherols were determined by HPLC coupled with diode-array and fluorescence detectors, respectively. Multivariate factor analysis (MFA) and linear correlations were applied. Significant negative correlations were found between tocopherols and some of the polyphenols identified, but more intensely (P pinoresinol, and luteolin. MFA revealed that tocopherols, especially γ-fraction, most strongly influenced the oil characterization. Among the phenolic compounds, syringic acid, dihydroxybenzoic acid, oleuropein, pinoresinol, and luteolin also contributed to the discrimination of the oils. According to the variables analyzed in the present study, argan oil presented the greatest similarity with walnut oil, followed by sesame and linseed oils. Olive, avocado, and almond oils showed close similarities.

  11. High-performance liquid chromatography based chemical fingerprint analysis and chemometric approaches for the identification and distinction of three endangered Panax plants in Southeast Asia.

    Science.gov (United States)

    Xia, Pengguo; Bai, Zhenqing; Liang, Tongyao; Yang, Dongfeng; Liang, Zongsuo; Yan, Xijun; Liu, Yan

    2016-10-01

    Among Panax genus, only three endangered species Panax notoginseng, P. vietnamensis, and P. stipuleanatus that have a similar morphology are mainly distributed in Southeast Asia. These three plants are usually misidentified or adulterated. To identify them well, their chemical chromatographic fingerprints were established by an effective high-performance liquid chromatography method. By comparing the chromatograms, the three Panax species could be distinguished easily using the 22 characteristic peaks. Besides, the data of the chromatographic fingerprints aided by chemometric approaches were applied for the identification and investigation the relationship of different samples and species. Using similarity analysis, the chemical components revealed higher similarity between P. vietnamensis and P. stipuleanatus. The results of hierarchical clustering analysis indicated that samples belonging to the same species could be clustered together. The result of principal component analysis was similar with hierarchical clustering analysis and the three principal components accounted for >80.5% of total variability.

  12. New variable selection method using interval segmentation purity with application to blockwise kernel transform support vector machine classification of high-dimensional microarray data.

    Science.gov (United States)

    Tang, Li-Juan; Du, Wen; Fu, Hai-Yan; Jiang, Jian-Hui; Wu, Hai-Long; Shen, Guo-Li; Yu, Ru-Qin

    2009-08-01

    One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant, or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. A new method for key gene selection has been proposed on the basis of interval segmentation purity that is defined as the purity of samples belonging to a certain class in intervals segmented by a mode search algorithm. This method identifies key variables most discriminative for each class, which offers possibility of unraveling the biological implication of selected genes. A salient advantage of the new strategy over existing methods is the capability of selecting genes that, though possibly exhibit a multimodal distribution, are the most discriminative for the classes of interest, considering that the expression levels of some genes may reflect systematic difference in within-class samples derived from different pathogenic mechanisms. On the basis of the key genes selected for individual classes, a support vector machine with block-wise kernel transform is developed for the classification of different classes. The combination of the proposed gene mining approach with support vector machine is demonstrated in cancer classification using two public data sets. The results reveal that significant genes have been identified for each class, and the classification model shows satisfactory performance in training and prediction for both data sets.

  13. Morphobiochemical Variability and Selection Strategies for the Germplasm of Dactylorhiza hatagirea (D. Don Soo: An Endangered Medicinal Orchid

    Directory of Open Access Journals (Sweden)

    R. S. Chauhan

    2014-01-01

    Full Text Available Dactylorhiza hatagirea (D. Don Soo (Orchidaceae is an important endangered medicinal herb, distributed in subalpine to alpine regions of the Himalayas. Its tubers are important constituents of many medicines and health tonics. Overexploitation for medicinal uses has decreased availability in natural habitats and this species has been enlisted as endangered, making conservation and cultivation studies necessary. Variability studies may serve as an important tool for effective conservation and for a crop improvement program. Therefore, natural populations of D. hatagirea were analyzed for variability on the basis of morphological, biochemical, and isoenzyme patterns. The studied populations were grouped into two clusters. Existing variability among different populations opens up new areas for conservation and perspectives for a genetic improvement program for D. hatagirea.

  14. 线性回归模型的Boosting变量选择方法∗%Boosting Variable Selection Algorithm for Linear Regression Models

    Institute of Scientific and Technical Information of China (English)

    李毓; 张春霞; 王冠伟

    2015-01-01

    针对线性回归模型的变量选择问题,本文基于遗传算法提出了一种新的Boosting学习方法。该方法对每一训练个体赋予权重,以遗传算法作为Boosting的基学习算法,将带有权重分布的训练集作为遗传算法的输入进行变量选择。同时,根据前一次变量选择效果的好坏更新训练集上的权重分布。重复上述步骤多次,最后以加权融合方式合并多次变量选择的结果。基于模拟和实际数据的试验结果表明,本文新提出的Boosting方法能显著提高传统遗传算法用于变量选择的质量,准确识别出与响应变量相关的协变量,这为线性回归模型的变量选择提供了一种有效的新方法。%With respect to variable selection for linear regression models, this paper proposes a novel Boosting learning method based on genetic algorithm. In the novel algorithm, all train-ing examples are firstly assigned equal weights and a traditional genetic algorithm is adopted as the base learning algorithm of Boosting. Then, the training set associated with a weight distribution is taken as the input of genetic algorithm to do variable selection. Subsequently, the weight distribution is updated according to the quality of the previous variable selection results. Through repeating the above steps for multiple times, the results are then fused via a weighted combination rule. The performance of the proposed Boosting method is investigated on some simulated and real-world data. The experimental results show that our method can significantly improve the variable selection performance of traditional genetic algorithm and accurately identify the relevant variables. Thus, the novel Boosting method can be deemed as an effective technique for handling variable selection problems in linear regression models.

  15. Rapid quantification of methamphetamine: using attenuated total reflectance fourier transform infrared spectroscopy (ATR-FTIR) and chemometrics.

    Science.gov (United States)

    Hughes, Juanita; Ayoko, Godwin; Collett, Simon; Golding, Gary

    2013-01-01

    In Australia and increasingly worldwide, methamphetamine is one of the most commonly seized drugs analysed by forensic chemists. The current well-established GC/MS methods used to identify and quantify methamphetamine are lengthy, expensive processes, but often rapid analysis is requested by undercover police leading to an interest in developing this new analytical technique. Ninety six illicit drug seizures containing methamphetamine (0.1%-78.6%) were analysed using Fourier Transform Infrared Spectroscopy with an Attenuated Total Reflectance attachment and Chemometrics. Two Partial Least Squares models were developed, one using the principal Infrared Spectroscopy peaks of methamphetamine and the other a Hierarchical Partial Least Squares model. Both of these models were refined to choose the variables that were most closely associated with the methamphetamine % vector. Both of the models were excellent, with the principal peaks in the Partial Least Squares model having Root Mean Square Error of Prediction 3.8, R(2) 0.9779 and lower limit of quantification 7% methamphetamine. The Hierarchical Partial Least Squares model had lower limit of quantification 0.3% methamphetamine, Root Mean Square Error of Prediction 5.2 and R(2) 0.9637. Such models offer rapid and effective methods for screening illicit drug samples to determine the percentage of methamphetamine they contain.

  16. Rapid Quantification of Methamphetamine: Using Attenuated Total Reflectance Fourier Transform Infrared Spectroscopy (ATR-FTIR) and Chemometrics

    Science.gov (United States)

    Hughes, Juanita; Ayoko, Godwin; Collett, Simon; Golding, Gary

    2013-01-01

    In Australia and increasingly worldwide, methamphetamine is one of the most commonly seized drugs analysed by forensic chemists. The current well-established GC/MS methods used to identify and quantify methamphetamine are lengthy, expensive processes, but often rapid analysis is requested by undercover police leading to an interest in developing this new analytical technique. Ninety six illicit drug seizures containing methamphetamine (0.1%–78.6%) were analysed using Fourier Transform Infrared Spectroscopy with an Attenuated Total Reflectance attachment and Chemometrics. Two Partial Least Squares models were developed, one using the principal Infrared Spectroscopy peaks of methamphetamine and the other a Hierarchical Partial Least Squares model. Both of these models were refined to choose the variables that were most closely associated with the methamphetamine % vector. Both of the models were excellent, with the principal peaks in the Partial Least Squares model having Root Mean Square Error of Prediction 3.8, R2 0.9779 and lower limit of quantification 7% methamphetamine. The Hierarchical Partial Least Squares model had lower limit of quantification 0.3% methamphetamine, Root Mean Square Error of Prediction 5.2 and R2 0.9637. Such models offer rapid and effective methods for screening illicit drug samples to determine the percentage of methamphetamine they contain. PMID:23936058

  17. Chemometric exploration of the abundance of trace metals and ions in desalinated and bottled drinking water in Kuwait.

    Science.gov (United States)

    Al-Mudhaf, Humood F; Astel, Aleksander M; Al-Hayan, Mohammad N; Abu-Shady, Abdel-Sattar I

    2014-01-01

    Chemometric exploration of desalinated and bottled water in Kuwait was employed to interpret the spatial variation in the physicochemical parameters. The data set consisted of the concentrations of principal macronutrient elements, ions, trace elements, temperature, pH, electrolytic conductivity, and total dissolved solids measured in indoor, outdoor, and bottled water samples. Quantitative assessment of the Cd, Hg, and Sb contents revealed rare cases of elevated concentrations; however, these concentrations were always below international health agency standards. Two general clusters of similar parameters were discovered in the variables mode and were associated with "natural" water characteristics or "conditions" of the pipeline system. We found that an increase in temperature facilitates the leaching of metals from the metallic equipment in the system. Spatial variation in the water quality was discovered, which indicates that residential areas fed from the Az-Zoor plant are supplied with water that contains lower concentrations of Ca, Cr, Mg, Mo, Ni, Na, TDS, and SO4 (2-) than the desalinated water produced and fed from the Doha plant. However, on the basis of the aluminum concentration in the water, cement mortar lining is assumed to be prevalent in the pipeline systems of the Mubarak Al-Kabeer, Ahmadi, Umm Al-Haiman, and Sorra areas.

  18. Rapid quantification of methamphetamine: using attenuated total reflectance fourier transform infrared spectroscopy (ATR-FTIR and chemometrics.

    Directory of Open Access Journals (Sweden)

    Juanita Hughes

    Full Text Available In Australia and increasingly worldwide, methamphetamine is one of the most commonly seized drugs analysed by forensic chemists. The current well-established GC/MS methods used to identify and quantify methamphetamine are lengthy, expensive processes, but often rapid analysis is requested by undercover police leading to an interest in developing this new analytical technique. Ninety six illicit drug seizures containing methamphetamine (0.1%-78.6% were analysed using Fourier Transform Infrared Spectroscopy with an Attenuated Total Reflectance attachment and Chemometrics. Two Partial Least Squares models were developed, one using the principal Infrared Spectroscopy peaks of methamphetamine and the other a Hierarchical Partial Least Squares model. Both of these models were refined to choose the variables that were most closely associated with the methamphetamine % vector. Both of the models were excellent, with the principal peaks in the Partial Least Squares model having Root Mean Square Error of Prediction 3.8, R(2 0.9779 and lower limit of quantification 7% methamphetamine. The Hierarchical Partial Least Squares model had lower limit of quantification 0.3% methamphetamine, Root Mean Square Error of Prediction 5.2 and R(2 0.9637. Such models offer rapid and effective methods for screening illicit drug samples to determine the percentage of methamphetamine they contain.

  19. Area- and Depth-Weighted Averages of Selected SSURGO Variables for the Conterminous United States and District of Columbia

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This digital data release consists of seven national data files of area- and depth-weighted averages of select soil attributes for every available county in the...

  20. Selected physiologic variables are weakly to moderately associated with 29 biomarkers of diet and nutrition, NHANES 2003-2006.

    Science.gov (United States)

    Haynes, Bridgette M H; Pfeiffer, Christine M; Sternberg, Maya R; Schleicher, Rosemary L

    2013-06-01

    The physiologic status of an individual may influence biomarkers of nutritional status. To help researchers with planning studies and interpreting data, we assessed the associations between common physiologic variables (fasting, inflammation, renal function, and pregnancy) and 29 biomarkers of diet and nutrition measured in blood or urine in a representative sample of the adult U.S. population (aged ≥ 20 y; pregnancy variable and iron indicators limited to women aged 20-49 y) participating in NHANES 2003-2006. We compared simple linear regression (model 1) with multiple linear regression [model 2, controlling for age, sex, race-ethnicity, smoking, supplement use, and the physiologic factors (and urine creatinine for urine biomarkers)] and report significant findings from model 2. Not being fasted was positively associated with most water-soluble vitamins (WSVs) and related metabolites (RMs). Some WSV, fat-soluble vitamin (FSV) and micronutrient (MN), and phytoestrogen concentrations were lower in the presence of inflammation (C-reactive protein ≥ 5 mg/L), whereas fatty acids and most iron indicators were higher. Most WSVs and RMs were higher when renal function was impaired [estimated glomerular filtration rate function, however, showed several large differences for WSV and RM concentrations. This descriptive analysis of associations between physiologic variables and a large number of nutritional biomarkers showed that controlling for demographic variables, smoking, and supplement use generally did not change the interpretation of bivariate results. The analysis serves as a useful basis for more complex future research.