WorldWideScience

Sample records for regression multivariate analysis

  1. Multivariate Regression Analysis and Slaughter Livestock,

    Science.gov (United States)

    AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY

  2. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  3. Regression Analysis for Multivariate Dependent Count Data Using Convolved Gaussian Processes

    OpenAIRE

    Sofro, A'yunin; Shi, Jian Qing; Cao, Chunzheng

    2017-01-01

    Research on Poisson regression analysis for dependent data has been developed rapidly in the last decade. One of difficult problems in a multivariate case is how to construct a cross-correlation structure and at the meantime make sure that the covariance matrix is positive definite. To address the issue, we propose to use convolved Gaussian process (CGP) in this paper. The approach provides a semi-parametric model and offers a natural framework for modeling common mean structure and covarianc...

  4. Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis.

    Science.gov (United States)

    Zhou, Yan; Wang, Pei; Wang, Xianlong; Zhu, Ji; Song, Peter X-K

    2017-01-01

    The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology-sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer. © 2016 WILEY PERIODICALS, INC.

  5. Multivariate and semiparametric kernel regression

    OpenAIRE

    Härdle, Wolfgang; Müller, Marlene

    1997-01-01

    The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...

  6. Multivariate regression analysis for determining short-term values of radon and its decay products from filter measurements

    International Nuclear Information System (INIS)

    Kraut, W.; Schwarz, W.; Wilhelm, A.

    1994-01-01

    A multivariate regression analysis is applied to decay measurements of α-resp. β-filter activcity. Activity concentrations for Po-218, Pb-214 and Bi-214, resp. for the Rn-222 equilibrium equivalent concentration are obtained explicitly. The regression analysis takes into account properly the variances of the measured count rates and their influence on the resulting activity concentrations. (orig.) [de

  7. AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

    OpenAIRE

    Krzyśko, Mirosław; Smaga, Łukasz

    2017-01-01

    In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...

  8. Multivariate Regression of Liver on Intestine of Mice: A ...

    African Journals Online (AJOL)

    Multivariate Regression of Liver on Intestine of Mice: A Chemotherapeutic Evaluation of Plant ... Using an analysis of covariance model, the effects ... The findings revealed, with the aid of likelihood-ratio statistic, a marked improvement in

  9. Robust multivariate analysis

    CERN Document Server

    J Olive, David

    2017-01-01

    This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given.  The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory.   The robust techniques  are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis.  A simple way to bootstrap confidence regions is also provided. Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with...

  10. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    Science.gov (United States)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  11. Multivariate analysis with LISREL

    CERN Document Server

    Jöreskog, Karl G; Y Wallentin, Fan

    2016-01-01

    This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.

  12. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    Science.gov (United States)

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  13. Regression Models For Multivariate Count Data.

    Science.gov (United States)

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  14. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    Science.gov (United States)

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-03-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.

  15. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  16. Structural brain connectivity and cognitive ability differences: A multivariate distance matrix regression analysis.

    Science.gov (United States)

    Ponsoda, Vicente; Martínez, Kenia; Pineda-Pardo, José A; Abad, Francisco J; Olea, Julio; Román, Francisco J; Barbey, Aron K; Colom, Roberto

    2017-02-01

    Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  17. Fourier transform infrared spectroscopic imaging and multivariate regression for prediction of proteoglycan content of articular cartilage.

    Directory of Open Access Journals (Sweden)

    Lassi Rieppo

    Full Text Available Fourier Transform Infrared (FT-IR spectroscopic imaging has been earlier applied for the spatial estimation of the collagen and the proteoglycan (PG contents of articular cartilage (AC. However, earlier studies have been limited to the use of univariate analysis techniques. Current analysis methods lack the needed specificity for collagen and PGs. The aim of the present study was to evaluate the suitability of partial least squares regression (PLSR and principal component regression (PCR methods for the analysis of the PG content of AC. Multivariate regression models were compared with earlier used univariate methods and tested with a sample material consisting of healthy and enzymatically degraded steer AC. Chondroitinase ABC enzyme was used to increase the variation in PG content levels as compared to intact AC. Digital densitometric measurements of Safranin O-stained sections provided the reference for PG content. The results showed that multivariate regression models predict PG content of AC significantly better than earlier used absorbance spectrum (i.e. the area of carbohydrate region with or without amide I normalization or second derivative spectrum univariate parameters. Increased molecular specificity favours the use of multivariate regression models, but they require more knowledge of chemometric analysis and extended laboratory resources for gathering reference data for establishing the models. When true molecular specificity is required, the multivariate models should be used.

  18. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi

    2014-01-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both

  19. Depth-weighted robust multivariate regression with application to sparse data

    KAUST Repository

    Dutta, Subhajit; Genton, Marc G.

    2017-01-01

    A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.

  20. Depth-weighted robust multivariate regression with application to sparse data

    KAUST Repository

    Dutta, Subhajit

    2017-04-05

    A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.

  1. An Efficient Local Algorithm for Distributed Multivariate Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm is designed for distributed...

  2. A Scalable Local Algorithm for Distributed Multivariate Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm can be used for distributed...

  3. Prognostic factorsin inoperable adenocarcinoma of the lung: A multivariate regression analysis of 259 patiens

    DEFF Research Database (Denmark)

    Sørensen, Jens Benn; Badsberg, Jens Henrik; Olsen, Jens

    1989-01-01

    The prognostic factors for survival in advanced adenocarcinoma of the lung were investigated in a consecutive series of 259 patients treated with chemotherapy. Twenty-eight pretreatment variables were investigated by use of Cox's multivariate regression model, including histological subtypes and ...

  4. Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

    Science.gov (United States)

    Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

    2014-08-01

    Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. A MULTIVARIATE ANALYSIS OF CROATIAN COUNTIES ENTREPRENEURSHIP

    Directory of Open Access Journals (Sweden)

    Elza Jurun

    2012-12-01

    Full Text Available In the focus of this paper is a multivariate analysis of Croatian Counties entrepreneurship. Complete data base available by official statistic institutions at national and regional level is used. Modern econometric methodology starting from a comparative analysis via multiple regression to multivariate cluster analysis is carried out as well as the analysis of successful or inefficacious entrepreneurship measured by indicators of efficiency, profitability and productivity. Time horizons of the comparative analysis are in 2004 and 2010. Accelerators of socio-economic development - number of entrepreneur investors, investment in fixed assets and current assets ratio in multiple regression model are analytically filtered between twenty-six independent variables as variables of the dominant influence on GDP per capita in 2010 as dependent variable. Results of multivariate cluster analysis of twentyone Croatian Counties are interpreted also in the sense of three Croatian NUTS 2 regions according to European nomenclature of regional territorial division of Croatia.

  6. Sunspot Cycle Prediction Using Multivariate Regression and Binary ...

    Indian Academy of Sciences (India)

    49

    Multivariate regression model has been derived based on the available cycles 1 .... The flare index correlates well with various parameters of the solar activity. ...... 32) Sabarinath A and Anilkumar A K 2011 A stochastic prediction model for the.

  7. Remote-sensing data processing with the multivariate regression analysis method for iron mineral resource potential mapping: a case study in the Sarvian area, central Iran

    Science.gov (United States)

    Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran

    2018-03-01

    This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).

  8. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  9. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    Science.gov (United States)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  10. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  11. EXPLORATORY DATA ANALYSIS AND MULTIVARIATE STRATEGIES FOR REVEALING MULTIVARIATE STRUCTURES IN CLIMATE DATA

    Directory of Open Access Journals (Sweden)

    2016-12-01

    Full Text Available This paper is on data analysis strategy in a complex, multidimensional, and dynamic domain. The focus is on the use of data mining techniques to explore the importance of multivariate structures; using climate variables which influences climate change. Techniques involved in data mining exercise vary according to the data structures. The multivariate analysis strategy considered here involved choosing an appropriate tool to analyze a process. Factor analysis is introduced into data mining technique in order to reveal the influencing impacts of factors involved as well as solving for multicolinearity effect among the variables. The temporal nature and multidimensionality of the target variables is revealed in the model using multidimensional regression estimates. The strategy of integrating the method of several statistical techniques, using climate variables in Nigeria was employed. R2 of 0.518 was obtained from the ordinary least square regression analysis carried out and the test was not significant at 5% level of significance. However, factor analysis regression strategy gave a good fit with R2 of 0.811 and the test was significant at 5% level of significance. Based on this study, model building should go beyond the usual confirmatory data analysis (CDA, rather it should be complemented with exploratory data analysis (EDA in order to achieve a desired result.

  12. Collision prediction models using multivariate Poisson-lognormal regression.

    Science.gov (United States)

    El-Basyouny, Karim; Sayed, Tarek

    2009-07-01

    This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.

  13. [Multivariate ordinal logistic regression analysis on the association between consumption of fried food and both esophageal cancer and precancerous lesions].

    Science.gov (United States)

    Guo, L W; Liu, S Z; Zhang, M; Chen, Q; Zhang, S K; Sun, X B

    2017-12-10

    Objective: To investigate the effect of fried food intake on the pathogenesis of esophageal cancer and precancerous lesions. Methods: From 2005 to 2013, all the residents aged 40-69 years from 11 counties (cities) where cancer screening of upper gastrointestinal cancer had been conducted in rural areas of Henan province, were recruited as the subjects of study. Information on demography and lifestyle was collected. The residents under study were screened with iodine staining endoscopic examination and biopsy samples were diagnosed pathologically, under standardized criteria. Subjects with high risk were divided into the groups based on their different pathological degrees. Multivariate ordinal logistic regression analysis was used to analyze the relationship between the frequency of fried food intake and esophageal cancer and precancerous lesions. Results: A total number of 8 792 cases with normal esophagus, 3 680 with mild hyperplasia, 972 with moderate hyperplasia, 413 with severe hyperplasia carcinoma in situ, and 336 cases of esophageal cancer were recruited. Results from multivariate logistic regression analysis showed that, when compared with those who did not eat fried food, the intake of fried food (food appeared a risk factor for both esophageal cancer and precancerous lesions.

  14. Asymptotics of Multivariate Regression with Consecutively Added Dependent Varibles

    NARCIS (Netherlands)

    Raats, V.M.; van der Genugten, B.B.; Moors, J.J.A.

    2004-01-01

    We consider multivariate regression where new dependent variables are consecutively added during the experiment (or in time).So, viewed at the end of the experiment, the number of observations decreases with each added variable. The explanatory variables are observed throughout.In a previous paper

  15. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  16. Risk factors for pedicled flap necrosis in hand soft tissue reconstruction: a multivariate logistic regression analysis.

    Science.gov (United States)

    Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun

    2018-03-01

    Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.

  17. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

    Science.gov (United States)

    Li, Yanming; Nan, Bin; Zhu, Ji

    2015-06-01

    We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.

  18. Multivariate Local Polynomial Regression with Application to Shenzhen Component Index

    Directory of Open Access Journals (Sweden)

    Liyun Su

    2011-01-01

    Full Text Available This study attempts to characterize and predict stock index series in Shenzhen stock market using the concepts of multivariate local polynomial regression. Based on nonlinearity and chaos of the stock index time series, multivariate local polynomial prediction methods and univariate local polynomial prediction method, all of which use the concept of phase space reconstruction according to Takens' Theorem, are considered. To fit the stock index series, the single series changes into bivariate series. To evaluate the results, the multivariate predictor for bivariate time series based on multivariate local polynomial model is compared with univariate predictor with the same Shenzhen stock index data. The numerical results obtained by Shenzhen component index show that the prediction mean squared error of the multivariate predictor is much smaller than the univariate one and is much better than the existed three methods. Even if the last half of the training data are used in the multivariate predictor, the prediction mean squared error is smaller than the univariate predictor. Multivariate local polynomial prediction model for nonsingle time series is a useful tool for stock market price prediction.

  19. Modelling lecturer performance index of private university in Tulungagung by using survival analysis with multivariate adaptive regression spline

    Science.gov (United States)

    Hasyim, M.; Prastyo, D. D.

    2018-03-01

    Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.

  20. Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

    Science.gov (United States)

    Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

    2017-05-01

    The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for

  1. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    Science.gov (United States)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross

  2. REGSTEP - stepwise multivariate polynomial regression with singular extensions

    International Nuclear Information System (INIS)

    Davierwalla, D.M.

    1977-09-01

    The program REGSTEP determines a polynomial approximation, in the least squares sense, to tabulated data. The polynomial may be univariate or multivariate. The computational method is that of stepwise regression. A variable is inserted into the regression basis if it is significant with respect to an appropriate F-test at a preselected risk level. In addition, should a variable already in the basis, become nonsignificant (again with respect to an appropriate F-test) after the entry of a new variable, it is expelled from the model. Thus only significant variables are retained in the model. Although written expressly to be incorporated into CORCOD, a code for predicting nuclear cross sections for given values of power, temperature, void fractions, Boron content etc. there is nothing to limit the use of REGSTEP to nuclear applications, as the examples demonstrate. A separate version has been incorporated into RSYST for the general user. (Auth.)

  3. Multivariate nonparametric regression and visualization with R and applications to finance

    CERN Document Server

    Klemelä, Jussi

    2014-01-01

    A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio

  4. Multivariate regression models for the simultaneous quantitative analysis of calcium and magnesium carbonates and magnesium oxide through drifts data

    Directory of Open Access Journals (Sweden)

    Marder Luciano

    2006-01-01

    Full Text Available In the present work multivariate regression models were developed for the quantitative analysis of ternary systems using Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS to determine the concentration in weight of calcium carbonate, magnesium carbonate and magnesium oxide. Nineteen spectra of standard samples previously defined in ternary diagram by mixture design were prepared and mid-infrared diffuse reflectance spectra were recorded. The partial least squares (PLS regression method was applied to the model. The spectra set was preprocessed by either mean-centered and variance-scaled (model 2 or mean-centered only (model 1. The results based on the prediction performance of the external validation set expressed by RMSEP (root mean square error of prediction demonstrated that it is possible to develop good models to simultaneously determine calcium carbonate, magnesium carbonate and magnesium oxide content in powdered samples that can be used in the study of the thermal decomposition of dolomite rocks.

  5. application of multilinear regression analysis in modeling of soil

    African Journals Online (AJOL)

    Windows User

    Accordingly [1, 3] in their work, they applied linear regression ... (MLRA) is a statistical technique that uses several explanatory ... order to check this, they adopted bivariate correlation analysis .... groups, namely A-1 through A-7, based on their relative expected ..... Multivariate Regression in Gorgan Province North of Iran” ...

  6. Correcting for multivariate measurement error by regression calibration in meta-analyses of epidemiological studies

    DEFF Research Database (Denmark)

    Tybjærg-Hansen, Anne

    2009-01-01

    Within-person variability in measured values of multiple risk factors can bias their associations with disease. The multivariate regression calibration (RC) approach can correct for such measurement error and has been applied to studies in which true values or independent repeat measurements...... of the risk factors are observed on a subsample. We extend the multivariate RC techniques to a meta-analysis framework where multiple studies provide independent repeat measurements and information on disease outcome. We consider the cases where some or all studies have repeat measurements, and compare study......-specific, averaged and empirical Bayes estimates of RC parameters. Additionally, we allow for binary covariates (e.g. smoking status) and for uncertainty and time trends in the measurement error corrections. Our methods are illustrated using a subset of individual participant data from prospective long-term studies...

  7. TMVA(Toolkit for Multivariate Analysis) new architectures design and implementation.

    CERN Document Server

    Zapata Mesa, Omar Andres

    2016-01-01

    Toolkit for Multivariate Analysis(TMVA) is a package in ROOT for machine learning algorithms for classification and regression of the events in the detectors. In TMVA, we are developing new high level algorithms to perform multivariate analysis as cross validation, hyper parameter optimization, variable importance etc... Almost all the algorithms are expensive and designed to process a huge amount of data. It is very important to implement the new technologies on parallel computing to reduce the processing times.

  8. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...

  9. Robust methods for multivariate data analysis A1

    DEFF Research Database (Denmark)

    Frosch, Stina; Von Frese, J.; Bro, Rasmus

    2005-01-01

    Outliers may hamper proper classical multivariate analysis, and lead to incorrect conclusions. To remedy the problem of outliers, robust methods are developed in statistics and chemometrics. Robust methods reduce or remove the effect of outlying data points and allow the ?good? data to primarily...... determine the result. This article reviews the most commonly used robust multivariate regression and exploratory methods that have appeared since 1996 in the field of chemometrics. Special emphasis is put on the robust versions of chemometric standard tools like PCA and PLS and the corresponding robust...

  10. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

    Directory of Open Access Journals (Sweden)

    Soyoung Park

    2017-07-01

    Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.

  11. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  12. Simultaneous determination of estrogens (ethinylestradiol and norgestimate) concentrations in human and bovine serum albumin by use of fluorescence spectroscopy and multivariate regression analysis.

    Science.gov (United States)

    Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O

    2016-05-15

    The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation

  13. Trochanteric entry femoral nails yield better femoral version and lower revision rates-A large cohort multivariate regression analysis.

    Science.gov (United States)

    Yoon, Richard S; Gage, Mark J; Galos, David K; Donegan, Derek J; Liporace, Frank A

    2017-06-01

    Intramedullary nailing (IMN) has become the standard of care for the treatment of most femoral shaft fractures. Different IMN options include trochanteric and piriformis entry as well as retrograde nails, which may result in varying degrees of femoral rotation. The objective of this study was to analyze postoperative femoral version between three types of nails and to delineate any significant differences in femoral version (DFV) and revision rates. Over a 10-year period, 417 patients underwent IMN of a diaphyseal femur fracture (AO/OTA 32A-C). Of these patients, 316 met inclusion criteria and obtained postoperative computed tomography (CT) scanograms to calculate femoral version and were thus included in the study. In this study, our main outcome measure was the difference in femoral version (DFV) between the uninjured limb and the injured limb. The effect of the following variables on DFV and revision rates were determined via univariate, multivariate, and ordinal regression analyses: gender, age, BMI, ethnicity, mechanism of injury, operative side, open fracture, and table type/position. Statistical significance was set at pregression analysis revealed that a lower BMI was significantly associated with a lower DFV (p=0.006). Controlling for possible covariables, multivariate analysis yielded a significantly lower DFV for trochanteric entry nails than piriformis or retrograde nails (7.9±6.10 vs. 9.5±7.4 vs. 9.4±7.8°, pregression analysis. However, this is not to state that the other nail types exhibited abnormal DFV. Translation to the clinical impact of a few degrees of DFV is also unknown. Future studies to more in-depth study the intricacies of femoral version may lead to improved technology in addition to potentially improved clinical outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Multivariate Frequency-Severity Regression Models in Insurance

    Directory of Open Access Journals (Sweden)

    Edward W. Frees

    2016-02-01

    Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.

  15. A comparison between univariate probabilistic and multivariate (logistic regression) methods for landslide susceptibility analysis: the example of the Febbraro valley (Northern Alps, Italy)

    Science.gov (United States)

    Rossi, M.; Apuani, T.; Felletti, F.

    2009-04-01

    The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9

  16. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

    Science.gov (United States)

    Delwiche, Stephen R; Reeves, James B

    2010-01-01

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various

  17. Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

    Science.gov (United States)

    Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

    2017-04-01

    Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP

  18. Regression analysis for LED color detection of visual-MIMO system

    Science.gov (United States)

    Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

    2018-04-01

    Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.

  19. Research on refugees and immigrants social integration in Yunnan Border Area: An empirical analysis on the multivariable linear regression model

    Directory of Open Access Journals (Sweden)

    Peng Nai

    2016-03-01

    Full Text Available A great number of immigration populations resident permanently in Yunnan Border Area of China. To some extent, these people belong to refugees or immigrants in accordance with International Rules, which significantly features the social diversity of this area. However, this kind of social diversity always impairs the social order. Therefore, there will be a positive influence to the local society governance by a research on local immigration integration. This essay hereby attempts to acquire the data of the living situation of these border area immigration and refugees. The analysis of the social integration of refugees and immigration in Yunnan border area in China will be deployed through the modeling of multivariable linear regression based on these data in order to propose some more achievable resolutions.

  20. Use of multivariate extensions of generalized linear models in the analysis of data from clinical trials

    OpenAIRE

    ALONSO ABAD, Ariel; Rodriguez, O.; TIBALDI, Fabian; CORTINAS ABRAHANTES, Jose

    2002-01-01

    In medical studies the categorical endpoints are quite often. Even though nowadays some models for handling this multicategorical variables have been developed their use is not common. This work shows an application of the Multivariate Generalized Linear Models to the analysis of Clinical Trials data. After a theoretical introduction models for ordinal and nominal responses are applied and the main results are discussed. multivariate analysis; multivariate logistic regression; multicategor...

  1. Methods of Multivariate Analysis

    CERN Document Server

    Rencher, Alvin C

    2012-01-01

    Praise for the Second Edition "This book is a systematic, well-written, well-organized text on multivariate analysis packed with intuition and insight . . . There is much practical wisdom in this book that is hard to find elsewhere."-IIE Transactions Filled with new and timely content, Methods of Multivariate Analysis, Third Edition provides examples and exercises based on more than sixty real data sets from a wide variety of scientific fields. It takes a "methods" approach to the subject, placing an emphasis on how students and practitioners can employ multivariate analysis in real-life sit

  2. Multivariate analysis of nystatin and metronidazole in a semi-solid matrix by means of diffuse reflectance NIR spectroscopy and PLS regression.

    Science.gov (United States)

    Baratieri, Sabrina C; Barbosa, Juliana M; Freitas, Matheus P; Martins, José A

    2006-01-23

    A multivariate method of analysis of nystatin and metronidazole in a semi-solid matrix, based on diffuse reflectance NIR measurements and partial least squares regression, is reported. The product, a vaginal cream used in the antifungal and antibacterial treatment, is usually, quantitatively analyzed through microbiological tests (nystatin) and HPLC technique (metronidazole), according to pharmacopeial procedures. However, near infrared spectroscopy has demonstrated to be a valuable tool for content determination, given the rapidity and scope of the method. In the present study, it was successfully applied in the prediction of nystatin (even in low concentrations, ca. 0.3-0.4%, w/w, which is around 100,000 IU/5g) and metronidazole contents, as demonstrated by some figures of merit, namely linearity, precision (mean and repeatability) and accuracy.

  3. Correcting for multivariate measurement error by regression calibration in meta-analyses of epidemiological studies.

    NARCIS (Netherlands)

    Kromhout, D.

    2009-01-01

    Within-person variability in measured values of multiple risk factors can bias their associations with disease. The multivariate regression calibration (RC) approach can correct for such measurement error and has been applied to studies in which true values or independent repeat measurements of the

  4. Regression analysis of radiological parameters in nuclear power plants

    International Nuclear Information System (INIS)

    Bhargava, Pradeep; Verma, R.K.; Joshi, M.L.

    2003-01-01

    Indian Pressurized Heavy Water Reactors (PHWRs) have now attained maturity in their operations. Indian PHWR operation started in the year 1972. At present there are 12 operating PHWRs collectively producing nearly 2400 MWe. Sufficient radiological data are available for analysis to draw inferences which may be utilised for better understanding of radiological parameters influencing the collective internal dose. Tritium is the main contributor to the occupational internal dose originating in PHWRs. An attempt has been made to establish the relationship between radiological parameters, which may be useful to draw inferences about the internal dose. Regression analysis have been done to find out the relationship, if it exist, among the following variables: A. Specific tritium activity of heavy water (Moderator and PHT) and tritium concentration in air at various work locations. B. Internal collective occupational dose and tritium release to environment through air route. C. Specific tritium activity of heavy water (Moderator and PHT) and collective internal occupational dose. For this purpose multivariate regression analysis has been carried out. D. Tritium concentration in air at various work location and tritium release to environment through air route. For this purpose multivariate regression analysis has been carried out. This analysis reveals that collective internal dose has got very good correlation with the tritium activity release to the environment through air route. Whereas no correlation has been found between specific tritium activity in the heavy water systems and collective internal occupational dose. The good correlation has been found in case D and F test reveals that it is not by chance. (author)

  5. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    Science.gov (United States)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  6. Multivariate calibration applied to the quantitative analysis of infrared spectra

    Energy Technology Data Exchange (ETDEWEB)

    Haaland, D.M.

    1991-01-01

    Multivariate calibration methods are very useful for improving the precision, accuracy, and reliability of quantitative spectral analyses. Spectroscopists can more effectively use these sophisticated statistical tools if they have a qualitative understanding of the techniques involved. A qualitative picture of the factor analysis multivariate calibration methods of partial least squares (PLS) and principal component regression (PCR) is presented using infrared calibrations based upon spectra of phosphosilicate glass thin films on silicon wafers. Comparisons of the relative prediction abilities of four different multivariate calibration methods are given based on Monte Carlo simulations of spectral calibration and prediction data. The success of multivariate spectral calibrations is demonstrated for several quantitative infrared studies. The infrared absorption and emission spectra of thin-film dielectrics used in the manufacture of microelectronic devices demonstrate rapid, nondestructive at-line and in-situ analyses using PLS calibrations. Finally, the application of multivariate spectral calibrations to reagentless analysis of blood is presented. We have found that the determination of glucose in whole blood taken from diabetics can be precisely monitored from the PLS calibration of either mind- or near-infrared spectra of the blood. Progress toward the non-invasive determination of glucose levels in diabetics is an ultimate goal of this research. 13 refs., 4 figs.

  7. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

    Science.gov (United States)

    Warton, David I; Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.

  8. Comparing treatment effects after adjustment with multivariable Cox proportional hazards regression and propensity score methods

    NARCIS (Netherlands)

    Martens, Edwin P; de Boer, Anthonius; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H

    PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort

  9. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.

  10. Non-proportional odds multivariate logistic regression of ordinal family data.

    Science.gov (United States)

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  12. On logistic regression analysis of dichotomized responses.

    Science.gov (United States)

    Lu, Kaifeng

    2017-01-01

    We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Multivariate Analysis and Prediction of Dioxin-Furan ...

    Science.gov (United States)

    Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE

  14. Aortic and Hepatic Contrast Enhancement During Hepatic-Arterial and Portal Venous Phase Computed Tomography Scanning: Multivariate Linear Regression Analysis Using Age, Sex, Total Body Weight, Height, and Cardiac Output.

    Science.gov (United States)

    Masuda, Takanori; Nakaura, Takeshi; Funama, Yoshinori; Higaki, Toru; Kiguchi, Masao; Imada, Naoyuki; Sato, Tomoyasu; Awai, Kazuo

    We evaluated the effect of the age, sex, total body weight (TBW), height (HT) and cardiac output (CO) of patients on aortic and hepatic contrast enhancement during hepatic-arterial phase (HAP) and portal venous phase (PVP) computed tomography (CT) scanning. This prospective study received institutional review board approval; prior informed consent to participate was obtained from all 168 patients. All were examined using our routine protocol; the contrast material was 600 mg/kg iodine. Cardiac output was measured with a portable electrical velocimeter within 5 minutes of starting the CT scan. We calculated contrast enhancement (per gram of iodine: [INCREMENT]HU/gI) of the abdominal aorta during the HAP and of the liver parenchyma during the PVP. We performed univariate and multivariate linear regression analysis between all patient characteristics and the [INCREMENT]HU/gI of aortic- and liver parenchymal enhancement. Univariate linear regression analysis demonstrated statistically significant correlations between the [INCREMENT]HU/gI and the age, sex, TBW, HT, and CO (all P linear regression analysis showed that only the TBW and CO were of independent predictive value (P linear regression analysis only the TBW and CO were significantly correlated with aortic and liver parenchymal enhancement; the age, sex, and HT were not. The CO was the only independent factor affecting aortic and liver parenchymal enhancement at hepatic CT when the protocol was adjusted for the TBW.

  15. Multivariate methods and forecasting with IBM SPSS statistics

    CERN Document Server

    Aljandali, Abdulkader

    2017-01-01

    This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...

  16. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    Science.gov (United States)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  17. A generalized multivariate regression model for modelling ocean wave heights

    Science.gov (United States)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  18. Multivariate analysis: models and method

    International Nuclear Information System (INIS)

    Sanz Perucha, J.

    1990-01-01

    Data treatment techniques are increasingly used since computer methods result of wider access. Multivariate analysis consists of a group of statistic methods that are applied to study objects or samples characterized by multiple values. A final goal is decision making. The paper describes the models and methods of multivariate analysis

  19. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    Science.gov (United States)

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  20. Handbook of univariate and multivariate data analysis with IBM SPSS

    CERN Document Server

    Ho, Robert

    2013-01-01

    Using the same accessible, hands-on approach as its best-selling predecessor, the Handbook of Univariate and Multivariate Data Analysis with IBM SPSS, Second Edition explains how to apply statistical tests to experimental findings, identify the assumptions underlying the tests, and interpret the findings. This second edition now covers more topics and has been updated with the SPSS statistical package for Windows.New to the Second EditionThree new chapters on multiple discriminant analysis, logistic regression, and canonical correlationNew section on how to deal with missing dataCoverage of te

  1. Multivariable Regression Analysis in Schistosoma mansoni-Infected Individuals in the Sudan Reveals Unique Immunoepidemiological Profiles in Uninfected, egg+ and Non-egg+ Infected Individuals.

    Science.gov (United States)

    Elfaki, Tayseer Elamin Mohamed; Arndts, Kathrin; Wiszniewsky, Anna; Ritter, Manuel; Goreish, Ibtisam A; Atti El Mekki, Misk El Yemen A; Arriens, Sandra; Pfarr, Kenneth; Fimmers, Rolf; Doenhoff, Mike; Hoerauf, Achim; Layland, Laura E

    2016-05-01

    In the Sudan, Schistosoma mansoni infections are a major cause of morbidity in school-aged children and infection rates are associated with available clean water sources. During infection, immune responses pass through a Th1 followed by Th2 and Treg phases and patterns can relate to different stages of infection or immunity. This retrospective study evaluated immunoepidemiological aspects in 234 individuals (range 4-85 years old) from Kassala and Khartoum states in 2011. Systemic immune profiles (cytokines and immunoglobulins) and epidemiological parameters were surveyed in n = 110 persons presenting patent S. mansoni infections (egg+), n = 63 individuals positive for S. mansoni via PCR in sera but egg negative (SmPCR+) and n = 61 people who were infection-free (Sm uninf). Immunoepidemiological findings were further investigated using two binary multivariable regression analysis. Nearly all egg+ individuals had no access to latrines and over 90% obtained water via the canal stemming from the Atbara River. With regards to age, infection and an egg+ status was linked to young and adolescent groups. In terms of immunology, S. mansoni infection per se was strongly associated with increased SEA-specific IgG4 but not IgE levels. IL-6, IL-13 and IL-10 were significantly elevated in patently-infected individuals and positively correlated with egg load. In contrast, IL-2 and IL-1β were significantly lower in SmPCR+ individuals when compared to Sm uninf and egg+ groups which was further confirmed during multivariate regression analysis. Schistosomiasis remains an important public health problem in the Sudan with a high number of patent individuals. In addition, SmPCR diagnostics revealed another cohort of infected individuals with a unique immunological profile and provides an avenue for future studies on non-patent infection states. Future studies should investigate the downstream signalling pathways/mechanisms of IL-2 and IL-1β as potential diagnostic markers in order to

  2. Multivariable Regression Analysis in Schistosoma mansoni-Infected Individuals in the Sudan Reveals Unique Immunoepidemiological Profiles in Uninfected, egg+ and Non-egg+ Infected Individuals.

    Directory of Open Access Journals (Sweden)

    Tayseer Elamin Mohamed Elfaki

    2016-05-01

    Full Text Available In the Sudan, Schistosoma mansoni infections are a major cause of morbidity in school-aged children and infection rates are associated with available clean water sources. During infection, immune responses pass through a Th1 followed by Th2 and Treg phases and patterns can relate to different stages of infection or immunity.This retrospective study evaluated immunoepidemiological aspects in 234 individuals (range 4-85 years old from Kassala and Khartoum states in 2011. Systemic immune profiles (cytokines and immunoglobulins and epidemiological parameters were surveyed in n = 110 persons presenting patent S. mansoni infections (egg+, n = 63 individuals positive for S. mansoni via PCR in sera but egg negative (SmPCR+ and n = 61 people who were infection-free (Sm uninf. Immunoepidemiological findings were further investigated using two binary multivariable regression analysis.Nearly all egg+ individuals had no access to latrines and over 90% obtained water via the canal stemming from the Atbara River. With regards to age, infection and an egg+ status was linked to young and adolescent groups. In terms of immunology, S. mansoni infection per se was strongly associated with increased SEA-specific IgG4 but not IgE levels. IL-6, IL-13 and IL-10 were significantly elevated in patently-infected individuals and positively correlated with egg load. In contrast, IL-2 and IL-1β were significantly lower in SmPCR+ individuals when compared to Sm uninf and egg+ groups which was further confirmed during multivariate regression analysis.Schistosomiasis remains an important public health problem in the Sudan with a high number of patent individuals. In addition, SmPCR diagnostics revealed another cohort of infected individuals with a unique immunological profile and provides an avenue for future studies on non-patent infection states. Future studies should investigate the downstream signalling pathways/mechanisms of IL-2 and IL-1β as potential diagnostic markers

  3. Applied multivariate statistical analysis

    CERN Document Server

    Härdle, Wolfgang Karl

    2015-01-01

    Focusing on high-dimensional applications, this 4th edition presents the tools and concepts used in multivariate data analysis in a style that is also accessible for non-mathematicians and practitioners.  It surveys the basic principles and emphasizes both exploratory and inferential statistics; a new chapter on Variable Selection (Lasso, SCAD and Elastic Net) has also been added.  All chapters include practical exercises that highlight applications in different multivariate data analysis fields: in quantitative financial studies, where the joint dynamics of assets are observed; in medicine, where recorded observations of subjects in different locations form the basis for reliable diagnoses and medication; and in quantitative marketing, where consumers’ preferences are collected in order to construct models of consumer behavior.  All of these examples involve high to ultra-high dimensions and represent a number of major fields in big data analysis. The fourth edition of this book on Applied Multivariate ...

  4. Changes in cod muscle proteins during frozen storage revealed by proteome analysis and multivariate data analysis

    DEFF Research Database (Denmark)

    Kjærsgård, Inger Vibeke Holst; Nørrelykke, M.R.; Jessen, Flemming

    2006-01-01

    Multivariate data analysis has been combined with proteomics to enhance the recovery of information from 2-DE of cod muscle proteins during different storage conditions. Proteins were extracted according to 11 different storage conditions and samples were resolved by 2-DE. Data generated by 2-DE...... was subjected to principal component analysis (PCA) and discriminant partial least squares regression (DPLSR). Applying PCA to 2-DE data revealed the samples to form groups according to frozen storage time, whereas differences due to different storage temperatures or chilled storage in modified atmosphere...... light chain 1, 2 and 3, triose-phosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, aldolase A and two ?-actin fragments, and a nuclease diphosphate kinase B fragment to change in concentration, during frozen storage. Application of proteomics, multivariate data analysis and MS/MS to analyse...

  5. Multivariate meta-analysis: Potential and promise

    Science.gov (United States)

    Jackson, Dan; Riley, Richard; White, Ian R

    2011-01-01

    The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052

  6. Multivariate analysis methods in physics

    International Nuclear Information System (INIS)

    Wolter, M.

    2007-01-01

    A review of multivariate methods based on statistical training is given. Several multivariate methods useful in high-energy physics analysis are discussed. Selected examples from current research in particle physics are discussed, both from the on-line trigger selection and from the off-line analysis. Also statistical training methods are presented and some new application are suggested [ru

  7. Linear regression and sensitivity analysis in nuclear reactor design

    International Nuclear Information System (INIS)

    Kumar, Akansha; Tsvetkov, Pavel V.; McClarren, Ryan G.

    2015-01-01

    Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data

  8. Econometric analysis of realised covariation: high frequency covariance, regression and correlation in financial economics

    OpenAIRE

    Ole E. Barndorff-Nielsen; Neil Shephard

    2002-01-01

    This paper analyses multivariate high frequency financial data using realised covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis and covariance. It will be based on a fixed interval of time (e.g. a day or week), allowing the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions and covariances change through time. In particular w...

  9. The Covariance Adjustment Approaches for Combining Incomparable Cox Regressions Caused by Unbalanced Covariates Adjustment: A Multivariate Meta-Analysis Study

    Directory of Open Access Journals (Sweden)

    Tania Dehesh

    2015-01-01

    Full Text Available Background. Univariate meta-analysis (UM procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS method as a multivariate meta-analysis approach. Methods. We evaluated the efficiency of four new approaches including zero correlation (ZC, common correlation (CC, estimated correlation (EC, and multivariate multilevel correlation (MMC on the estimation bias, mean square error (MSE, and 95% probability coverage of the confidence interval (CI in the synthesis of Cox proportional hazard models coefficients in a simulation study. Result. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. Conclusion. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients.

  10. The Covariance Adjustment Approaches for Combining Incomparable Cox Regressions Caused by Unbalanced Covariates Adjustment: A Multivariate Meta-Analysis Study.

    Science.gov (United States)

    Dehesh, Tania; Zare, Najaf; Ayatollahi, Seyyed Mohammad Taghi

    2015-01-01

    Univariate meta-analysis (UM) procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS) method as a multivariate meta-analysis approach. We evaluated the efficiency of four new approaches including zero correlation (ZC), common correlation (CC), estimated correlation (EC), and multivariate multilevel correlation (MMC) on the estimation bias, mean square error (MSE), and 95% probability coverage of the confidence interval (CI) in the synthesis of Cox proportional hazard models coefficients in a simulation study. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients.

  11. Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

    Science.gov (United States)

    Ye, Lanhan; Song, Kunlin; Shen, Tingting

    2018-01-01

    Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445

  12. Exploratory multivariate analysis by example using R

    CERN Document Server

    Husson, Francois; Pages, Jerome

    2010-01-01

    Full of real-world case studies and practical advice, Exploratory Multivariate Analysis by Example Using R focuses on four fundamental methods of multivariate exploratory data analysis that are most suitable for applications. It covers principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, and hierarchical cluster analysis.The authors take a geometric point of view that provides a unified vision for exploring multivariate data tables. Within this framework, they present the prin

  13. Multivariate survival analysis and competing risks

    CERN Document Server

    Crowder, Martin J

    2012-01-01

    Multivariate Survival Analysis and Competing Risks introduces univariate survival analysis and extends it to the multivariate case. It covers competing risks and counting processes and provides many real-world examples, exercises, and R code. The text discusses survival data, survival distributions, frailty models, parametric methods, multivariate data and distributions, copulas, continuous failure, parametric likelihood inference, and non- and semi-parametric methods. There are many books covering survival analysis, but very few that cover the multivariate case in any depth. Written for a graduate-level audience in statistics/biostatistics, this book includes practical exercises and R code for the examples. The author is renowned for his clear writing style, and this book continues that trend. It is an excellent reference for graduate students and researchers looking for grounding in this burgeoning field of research.

  14. Multivariate analysis of risk factors for long-term urethroplasty outcome.

    Science.gov (United States)

    Breyer, Benjamin N; McAninch, Jack W; Whitson, Jared M; Eisenberg, Michael L; Mehdizadeh, Jennifer F; Myers, Jeremy B; Voelzke, Bryan B

    2010-02-01

    We studied the patient risk factors that promote urethroplasty failure. Records of patients who underwent urethroplasty at the University of California, San Francisco Medical Center between 1995 and 2004 were reviewed. Cox proportional hazards regression analysis was used to identify multivariate predictors of urethroplasty outcome. Between 1995 and 2004, 443 patients of 495 who underwent urethroplasty had complete comorbidity data and were included in analysis. Median patient age was 41 years (range 18 to 90). Median followup was 5.8 years (range 1 month to 10 years). Stricture recurred in 93 patients (21%). Primary estimated stricture-free survival at 1, 3 and 5 years was 88%, 82% and 79%. After multivariate analysis smoking (HR 1.8, 95% CI 1.0-3.1, p = 0.05), prior direct vision internal urethrotomy (HR 1.7, 95% CI 1.0-3.0, p = 0.04) and prior urethroplasty (HR 1.8, 95% CI 1.1-3.1, p = 0.03) were predictive of treatment failure. On multivariate analysis diabetes mellitus showed a trend toward prediction of urethroplasty failure (HR 2.0, 95% CI 0.8-4.9, p = 0.14). Length of urethral stricture (greater than 4 cm), prior urethroplasty and failed endoscopic therapy are predictive of failure after urethroplasty. Smoking and diabetes mellitus also may predict failure potentially secondary to microvascular damage. Copyright 2010 American Urological Association. Published by Elsevier Inc. All rights reserved.

  15. Method for statistical data analysis of multivariate observations

    CERN Document Server

    Gnanadesikan, R

    1997-01-01

    A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte

  16. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  17. Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

    Directory of Open Access Journals (Sweden)

    Fei Liu

    2018-02-01

    Full Text Available Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS, coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice. For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV. Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.

  18. Multivariate statistical analysis of a multi-step industrial processes

    DEFF Research Database (Denmark)

    Reinikainen, S.P.; Høskuldsson, Agnar

    2007-01-01

    Monitoring and quality control of industrial processes often produce information on how the data have been obtained. In batch processes, for instance, the process is carried out in stages; some process or control parameters are set at each stage. However, the obtained data might not be utilized...... efficiently, even if this information may reveal significant knowledge about process dynamics or ongoing phenomena. When studying the process data, it may be important to analyse the data in the light of the physical or time-wise development of each process step. In this paper, a unified approach to analyse...... multivariate multi-step processes, where results from each step are used to evaluate future results, is presented. The methods presented are based on Priority PLS Regression. The basic idea is to compute the weights in the regression analysis for given steps, but adjust all data by the resulting score vectors...

  19. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  20. Econometric analysis of realized covariation: high frequency based covariance, regression, and correlation in financial economics

    DEFF Research Database (Denmark)

    Barndorff-Nielsen, Ole Eiler; Shephard, N.

    2004-01-01

    This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing...... the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....

  1. Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

    Science.gov (United States)

    Ma, Yan; Mazumdar, Madhu

    2011-10-30

    Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.

  2. A retrospective study: Multivariate logistic regression analysis of the outcomes after pressure sores reconstruction with fasciocutaneous, myocutaneous, and perforator flaps.

    Science.gov (United States)

    Chiu, Yu-Jen; Liao, Wen-Chieh; Wang, Tien-Hsiang; Shih, Yu-Chung; Ma, Hsu; Lin, Chih-Hsun; Wu, Szu-Hsien; Perng, Cherng-Kang

    2017-08-01

    Despite significant advances in medical care and surgical techniques, pressure sore reconstruction is still prone to elevated rates of complication and recurrence. We conducted a retrospective study to investigate not only complication and recurrence rates following pressure sore reconstruction but also preoperative risk stratification. This study included 181 ulcers underwent flap operations between January 2002 and December 2013 were included in the study. We performed a multivariable logistic regression model, which offers a regression-based method accounting for the within-patient correlation of the success or failure of each flap. The overall complication and recurrence rates for all flaps were 46.4% and 16.0%, respectively, with a mean follow-up period of 55.4 ± 38.0 months. No statistically significant differences of complication and recurrence rates were observed among three different reconstruction methods. In subsequent analysis, albumin ≤3.0 g/dl and paraplegia were significantly associated with higher postoperative complication. The anatomic factor, ischial wound location, significantly trended toward the development of ulcer recurrence. In the fasciocutaneous group, paraplegia had significant correlation to higher complication and recurrence rates. In the musculocutaneous flap group, variables had no significant correlation to complication and recurrence rates. In the free-style perforator group, ischial wound location and malnourished status correlated with significantly higher complication rates; ischial wound location also correlated with significantly higher recurrence rate. Ultimately, our review of a noteworthy cohort with lengthy follow-up helped identify and confirm certain risk factors that can facilitate a more informed and thoughtful pre- and postoperative decision-making process for patients with pressure ulcers. Copyright © 2017 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All

  3. Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

    Science.gov (United States)

    Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2017-09-01

    Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

  4. Matrix-based introduction to multivariate data analysis

    CERN Document Server

    Adachi, Kohei

    2016-01-01

    This book enables readers who may not be familiar with matrices to understand a variety of multivariate analysis procedures in matrix forms. Another feature of the book is that it emphasizes what model underlies a procedure and what objective function is optimized for fitting the model to data. The author believes that the matrix-based learning of such models and objective functions is the fastest way to comprehend multivariate data analysis. The text is arranged so that readers can intuitively capture the purposes for which multivariate analysis procedures are utilized: plain explanations of the purposes with numerical examples precede mathematical descriptions in almost every chapter. This volume is appropriate for undergraduate students who already have studied introductory statistics. Graduate students and researchers who are not familiar with matrix-intensive formulations of multivariate data analysis will also find the book useful, as it is based on modern matrix formulations with a special emphasis on ...

  5. Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

    Science.gov (United States)

    Warner, Rebecca M.

    2007-01-01

    This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…

  6. Is ovarian hyperstimulation associated with higher blood pressure in 4-year-old IVF offspring? Part I: multivariable regression analysis.

    Science.gov (United States)

    Seggers, Jorien; Haadsma, Maaike L; La Bastide-Van Gemert, Sacha; Heineman, Maas Jan; Middelburg, Karin J; Roseboom, Tessa J; Schendelaar, Pamela; Van den Heuvel, Edwin R; Hadders-Algra, Mijna

    2014-03-01

    COH-IVF group than in the Sub-NC group (B: 0.28; 95% CI: 0.03-0.53). Larger study groups are necessary to draw firm conclusions. An effect of gender or ICSI could not be properly investigated as stratifying would further reduce the sample size. We corrected for the known differences between MNC-IVF and COH-IVF but it is possible that the groups differ in additional, more subtle parental characteristics. In addition, we measured BP on 1 day only, had no control group of children born to fertile couples (precluding investigating effects of the underlying subfertility) and included singletons only. As COH-IVF is associated with multiple births we may have underestimated cardiometabolic problems after COH-IVF. Finally, multivariable regression analysis does not provide clear insight in the causal mechanisms and we have performed further explorative analyses. Our findings are in line with other studies describing adverse effects of IVF on cardiometabolic outcome but this is the first study suggesting that ovarian hyperstimulation, as used in IVF treatments, could be a causative mechanism. Perhaps ovarian hyperstimulation negatively influences cardiometabolic outcome via changes in the early environment of the oocyte and/or embryo that result in epigenetic modifications of key metabolic systems that are involved in BP regulation. Future research needs to assess further the role of ovarian hyperstimulation in poorer cardiometabolic outcome and investigate the underlying mechanisms. The findings emphasize the importance of cardiometabolic monitoring of the growing number of children born following IVF. The authors have no conflicts of interest to declare. The study was supported by the University Medical Center Groningen, the Cornelia Foundation and the school for Behavioral- and Cognitive Neurosciences. The sponsors of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report.

  7. Geoelectrical parameter-based multivariate regression borehole yield model for predicting aquifer yield in managing groundwater resource sustainability

    Directory of Open Access Journals (Sweden)

    Kehinde Anthony Mogaji

    2016-07-01

    Full Text Available This study developed a GIS-based multivariate regression (MVR yield rate prediction model of groundwater resource sustainability in the hard-rock geology terrain of southwestern Nigeria. This model can economically manage the aquifer yield rate potential predictions that are often overlooked in groundwater resources development. The proposed model relates the borehole yield rate inventory of the area to geoelectrically derived parameters. Three sets of borehole yield rate conditioning geoelectrically derived parameters—aquifer unit resistivity (ρ, aquifer unit thickness (D and coefficient of anisotropy (λ—were determined from the acquired and interpreted geophysical data. The extracted borehole yield rate values and the geoelectrically derived parameter values were regressed to develop the MVR relationship model by applying linear regression and GIS techniques. The sensitivity analysis results of the MVR model evaluated at P ⩽ 0.05 for the predictors ρ, D and λ provided values of 2.68 × 10−05, 2 × 10−02 and 2.09 × 10−06, respectively. The accuracy and predictive power tests conducted on the MVR model using the Theil inequality coefficient measurement approach, coupled with the sensitivity analysis results, confirmed the model yield rate estimation and prediction capability. The MVR borehole yield prediction model estimates were processed in a GIS environment to model an aquifer yield potential prediction map of the area. The information on the prediction map can serve as a scientific basis for predicting aquifer yield potential rates relevant in groundwater resources sustainability management. The developed MVR borehole yield rate prediction mode provides a good alternative to other methods used for this purpose.

  8. Integrated environmental monitoring and multivariate data analysis-A case study.

    Science.gov (United States)

    Eide, Ingvar; Westad, Frank; Nilssen, Ingunn; de Freitas, Felipe Sales; Dos Santos, Natalia Gomes; Dos Santos, Francisco; Cabral, Marcelo Montenegro; Bicego, Marcia Caruso; Figueira, Rubens; Johnsen, Ståle

    2017-03-01

    The present article describes integration of environmental monitoring and discharge data and interpretation using multivariate statistics, principal component analysis (PCA), and partial least squares (PLS) regression. The monitoring was carried out at the Peregrino oil field off the coast of Brazil. One sensor platform and 3 sediment traps were placed on the seabed. The sensors measured current speed and direction, turbidity, temperature, and conductivity. The sediment trap samples were used to determine suspended particulate matter that was characterized with respect to a number of chemical parameters (26 alkanes, 16 PAHs, N, C, calcium carbonate, and Ba). Data on discharges of drill cuttings and water-based drilling fluid were provided on a daily basis. The monitoring was carried out during 7 campaigns from June 2010 to October 2012, each lasting 2 to 3 months due to the capacity of the sediment traps. The data from the campaigns were preprocessed, combined, and interpreted using multivariate statistics. No systematic difference could be observed between campaigns or traps despite the fact that the first campaign was carried out before drilling, and 1 of 3 sediment traps was located in an area not expected to be influenced by the discharges. There was a strong covariation between suspended particulate matter and total N and organic C suggesting that the majority of the sediment samples had a natural and biogenic origin. Furthermore, the multivariate regression showed no correlation between discharges of drill cuttings and sediment trap or turbidity data taking current speed and direction into consideration. Because of this lack of correlation with discharges from the drilling location, a more detailed evaluation of chemical indicators providing information about origin was carried out in addition to numerical modeling of dispersion and deposition. The chemical indicators and the modeling of dispersion and deposition support the conclusions from the multivariate

  9. Endpoint in plasma etch process using new modified w-multivariate charts and windowed regression

    Science.gov (United States)

    Zakour, Sihem Ben; Taleb, Hassen

    2017-09-01

    Endpoint detection is very important undertaking on the side of getting a good understanding and figuring out if a plasma etching process is done in the right way, especially if the etched area is very small (0.1%). It truly is a crucial part of supplying repeatable effects in every single wafer. When the film being etched has been completely cleared, the endpoint is reached. To ensure the desired device performance on the produced integrated circuit, the high optical emission spectroscopy (OES) sensor is employed. The huge number of gathered wavelengths (profiles) is then analyzed and pre-processed using a new proposed simple algorithm named Spectra peak selection (SPS) to select the important wavelengths, then we employ wavelet analysis (WA) to enhance the performance of detection by suppressing noise and redundant information. The selected and treated OES wavelengths are then used in modified multivariate control charts (MEWMA and Hotelling) for three statistics (mean, SD and CV) and windowed polynomial regression for mean. The employ of three aforementioned statistics is motivated by controlling mean shift, variance shift and their ratio (CV) if both mean and SD are not stable. The control charts show their performance in detecting endpoint especially W-mean Hotelling chart and the worst result is given by CV statistic. As the best detection of endpoint is given by the W-Hotelling mean statistic, this statistic will be used to construct a windowed wavelet Hotelling polynomial regression. This latter can only identify the window containing endpoint phenomenon.

  10. Multivariate analysis of quantitative traits can effectively classify rapeseed germplasm

    Directory of Open Access Journals (Sweden)

    Jankulovska Mirjana

    2014-01-01

    Full Text Available In this study, the use of different multivariate approaches to classify rapeseed genotypes based on quantitative traits has been presented. Tree regression analysis, PCA analysis and two-way cluster analysis were applied in order todescribe and understand the extent of genetic variability in spring rapeseed genotype by trait data. The traits which highly influenced seed and oil yield in rapeseed were successfully identified by the tree regression analysis. Principal predictor for both response variables was number of pods per plant (NP. NP and 1000 seed weight could help in the selection of high yielding genotypes. High values for both traits and oil content could lead to high oil yielding genotypes. These traits may serve as indirect selection criteria and can lead to improvement of seed and oil yield in rapeseed. Quantitative traits that explained most of the variability in the studied germplasm were classified using principal component analysis. In this data set, five PCs were identified, out of which the first three PCs explained 63% of the total variance. It helped in facilitating the choice of variables based on which the genotypes’ clustering could be performed. The two-way cluster analysissimultaneously clustered genotypes and quantitative traits. The final number of clusters was determined using bootstrapping technique. This approach provided clear overview on the variability of the analyzed genotypes. The genotypes that have similar performance regarding the traits included in this study can be easily detected on the heatmap. Genotypes grouped in the clusters 1 and 8 had high values for seed and oil yield, and relatively short vegetative growth duration period and those in cluster 9, combined moderate to low values for vegetative growth duration and moderate to high seed and oil yield. These genotypes should be further exploited and implemented in the rapeseed breeding program. The combined application of these multivariate methods

  11. Multivariate refined composite multiscale entropy analysis

    International Nuclear Information System (INIS)

    Humeau-Heurtier, Anne

    2016-01-01

    Multiscale entropy (MSE) has become a prevailing method to quantify signals complexity. MSE relies on sample entropy. However, MSE may yield imprecise complexity estimation at large scales, because sample entropy does not give precise estimation of entropy when short signals are processed. A refined composite multiscale entropy (RCMSE) has therefore recently been proposed. Nevertheless, RCMSE is for univariate signals only. The simultaneous analysis of multi-channel (multivariate) data often over-performs studies based on univariate signals. We therefore introduce an extension of RCMSE to multivariate data. Applications of multivariate RCMSE to simulated processes reveal its better performances over the standard multivariate MSE. - Highlights: • Multiscale entropy quantifies data complexity but may be inaccurate at large scale. • A refined composite multiscale entropy (RCMSE) has therefore recently been proposed. • Nevertheless, RCMSE is adapted to univariate time series only. • We herein introduce an extension of RCMSE to multivariate data. • It shows better performances than the standard multivariate multiscale entropy.

  12. A comparative study of artificial neural network and multivariate regression analysis to analyze optimum renal stone fragmentation by extracorporeal shock wave lithotripsy

    Directory of Open Access Journals (Sweden)

    Goyal Neeraj

    2010-01-01

    Full Text Available To compare the accuracy of artificial neural network (ANN analysis and multi-variate regression analysis (MVRA for renal stone fragmentation by extracorporeal shock wave lithotripsy (ESWL. A total of 276 patients with renal calculus were treated by ESWL during December 2001 to December 2006. Of them, the data of 196 patients were used for training the ANN. The predictability of trained ANN was tested on 80 subsequent patients. The input data include age of patient, stone size, stone burden, number of sittings and urinary pH. The output values (predicted values were number of shocks and shock power. Of these 80 patients, the input was analyzed and output was also calculated by MVRA. The output values (predicted values from both the methods were compared and the results were drawn. The predicted and observed values of shock power and number of shocks were compared using 1:1 slope line. The results were calculated as coefficient of correlation (COC (r2 . For prediction of power, the MVRA COC was 0.0195 and ANN COC was 0.8343. For prediction of number of shocks, the MVRA COC was 0.5726 and ANN COC was 0.9329. In conclusion, ANN gives better COC than MVRA, hence could be a better tool to analyze the optimum renal stone fragmentation by ESWL.

  13. A comparative study of artificial neural network and multivariate regression analysis to analyze optimum renal stone fragmentation by extracorporeal shock wave lithotripsy

    International Nuclear Information System (INIS)

    Neeraj K Goyal, Abhay Kumar; Sameer Trivedi

    2010-01-01

    To compare the accuracy of artificial neural network (ANN) analysis and multivariate regression analysis (MVRA) for renal stone fragmentation by extracorporeal shock wave lithotripsy (ESWL). A total of 276 patients with renal calculus were treated by ESWL during December 2001 to December 2006. Of them, the data of 196 patients were used for training the ANN. The predictability of trained ANN was tested on 80 subsequent patients. The input data include age of patient, stone size, stone burden, number of sittings and urinary pH. The output values (predicted values) were number of shocks and shock power. Of these 80 patients, the input was analyzed and output was also calculated by MVRA. The output values (predicted values) from both the methods were compared and the results were drawn. The predicted and observed values of shock power and number of shocks were compared using 1:1 slope line. The results were calculated as coefficient of correlation (COC) (r2 ). For prediction of power, the MVRA COC was 0.0195 and ANN COC was 0.8343. For prediction of number of shocks, the MVRA COC was 0.5726 and ANN COC was 0.9329. In conclusion, ANN gives better COC than MVRA, hence could be a better tool to analyze the optimum renal stone fragmentation by ESWL (Author).

  14. Real estate value prediction using multivariate regression models

    Science.gov (United States)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  15. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    Science.gov (United States)

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  16. Multivariate reference technique for quantitative analysis of fiber-optic tissue Raman spectroscopy.

    Science.gov (United States)

    Bergholt, Mads Sylvest; Duraipandian, Shiyamala; Zheng, Wei; Huang, Zhiwei

    2013-12-03

    We report a novel method making use of multivariate reference signals of fused silica and sapphire Raman signals generated from a ball-lens fiber-optic Raman probe for quantitative analysis of in vivo tissue Raman measurements in real time. Partial least-squares (PLS) regression modeling is applied to extract the characteristic internal reference Raman signals (e.g., shoulder of the prominent fused silica boson peak (~130 cm(-1)); distinct sapphire ball-lens peaks (380, 417, 646, and 751 cm(-1))) from the ball-lens fiber-optic Raman probe for quantitative analysis of fiber-optic Raman spectroscopy. To evaluate the analytical value of this novel multivariate reference technique, a rapid Raman spectroscopy system coupled with a ball-lens fiber-optic Raman probe is used for in vivo oral tissue Raman measurements (n = 25 subjects) under 785 nm laser excitation powers ranging from 5 to 65 mW. An accurate linear relationship (R(2) = 0.981) with a root-mean-square error of cross validation (RMSECV) of 2.5 mW can be obtained for predicting the laser excitation power changes based on a leave-one-subject-out cross-validation, which is superior to the normal univariate reference method (RMSE = 6.2 mW). A root-mean-square error of prediction (RMSEP) of 2.4 mW (R(2) = 0.985) can also be achieved for laser power prediction in real time when we applied the multivariate method independently on the five new subjects (n = 166 spectra). We further apply the multivariate reference technique for quantitative analysis of gelatin tissue phantoms that gives rise to an RMSEP of ~2.0% (R(2) = 0.998) independent of laser excitation power variations. This work demonstrates that multivariate reference technique can be advantageously used to monitor and correct the variations of laser excitation power and fiber coupling efficiency in situ for standardizing the tissue Raman intensity to realize quantitative analysis of tissue Raman measurements in vivo, which is particularly appealing in

  17. Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic.

    Science.gov (United States)

    McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S

    2017-12-01

    Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.

  18. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    Science.gov (United States)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  19. On the degrees of freedom of reduced-rank estimators in multivariate regression.

    Science.gov (United States)

    Mukherjee, A; Chen, K; Wang, N; Zhu, J

    We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.

  20. Partitioning of Multivariate Phenotypes using Regression Trees Reveals Complex Patterns of Adaptation to Climate across the Range of Black Cottonwood (Populus trichocarpa

    Directory of Open Access Journals (Sweden)

    Regis Wendpouire Oubida

    2015-03-01

    Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.

  1. Beer fermentation: monitoring of process parameters by FT-NIR and multivariate data analysis.

    Science.gov (United States)

    Grassi, Silvia; Amigo, José Manuel; Lyndgaard, Christian Bøge; Foschino, Roberto; Casiraghi, Ernestina

    2014-07-15

    This work investigates the capability of Fourier-Transform near infrared (FT-NIR) spectroscopy to monitor and assess process parameters in beer fermentation at different operative conditions. For this purpose, the fermentation of wort with two different yeast strains and at different temperatures was monitored for nine days by FT-NIR. To correlate the collected spectra with °Brix, pH and biomass, different multivariate data methodologies were applied. Principal component analysis (PCA), partial least squares (PLS) and locally weighted regression (LWR) were used to assess the relationship between FT-NIR spectra and the abovementioned process parameters that define the beer fermentation. The accuracy and robustness of the obtained results clearly show the suitability of FT-NIR spectroscopy, combined with multivariate data analysis, to be used as a quality control tool in the beer fermentation process. FT-NIR spectroscopy, when combined with LWR, demonstrates to be a perfectly suitable quantitative method to be implemented in the production of beer. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. A multivariate tobit analysis of highway accident-injury-severity rates.

    Science.gov (United States)

    Anastasopoulos, Panagiotis Ch; Shankar, Venky N; Haddock, John E; Mannering, Fred L

    2012-03-01

    Relatively recent research has illustrated the potential that tobit regression has in studying factors that affect vehicle accident rates (accidents per distance traveled) on specific roadway segments. Tobit regression has been used because accident rates on specific roadway segments are continuous data that are left-censored at zero (they are censored because accidents may not be observed on all roadway segments during the period over which data are collected). This censoring may arise from a number of sources, one of which being the possibility that less severe crashes may be under-reported and thus may be less likely to appear in crash databases. Traditional tobit-regression analyses have dealt with the overall accident rate (all crashes regardless of injury severity), so the issue of censoring by the severity of crashes has not been addressed. However, a tobit-regression approach that considers accident rates by injury-severity level, such as the rate of no-injury, possible injury and injury accidents per distance traveled (as opposed to all accidents regardless of injury-severity), can potentially provide new insights, and address the possibility that censoring may vary by crash-injury severity. Using five-year data from highways in Washington State, this paper estimates a multivariate tobit model of accident-injury-severity rates that addresses the possibility of differential censoring across injury-severity levels, while also accounting for the possible contemporaneous error correlation resulting from commonly shared unobserved characteristics across roadway segments. The empirical results show that the multivariate tobit model outperforms its univariate counterpart, is practically equivalent to the multivariate negative binomial model, and has the potential to provide a fuller understanding of the factors determining accident-injury-severity rates on specific roadway segments. Published by Elsevier Ltd.

  3. Evaluation of functional outcome of the floating knee injury using multivariate analysis.

    Science.gov (United States)

    Yokoyama, Kazuhiko; Tsukamoto, Tatsuro; Aoki, Shinichi; Wakita, Ryuji; Uchino, Masataka; Noumi, Takashi; Fukushima, Nobuaki; Itoman, Moritoshi

    2002-11-01

    The objective of this study is to evaluate significant contributing factors affecting the functional prognosis of floating knee injuries using multivariate analysis. A total of 68 floating knee injuries (67 patients) were treated at Kitasato University Hospital from 1986 to 1999. Both the femoral fractures and the tibial fractures were managed surgically by various methods. The functional results of these injuries were evaluated using the grading system of Karlström and Olerud. Follow-up periods ranged from 2 to 19 years (mean 50.2 months) after the original injury. We defined satisfactory (S) outcomes as those cases with excellent or good results and unsatisfactory (US) outcomes as those cases with acceptable or poor results. Logistic regression analysis was used as a multivariate analysis, and the dependent variables were defined as a satisfactory outcome or as an unsatisfactory outcome. The explanatory variables were predicting factors influencing the functional outcome such as age at trauma, gender, severity of soft-tissue injury in the femur and the tibia, AO fracture grade in the femur and the tibia, Fraser type (type I or type II), Injury Severity Score (ISS), and fixation time after injury (less than 1 week or more than 1 week) in the femur and the tibia. The final functional results were as follows: 25 cases had excellent results, 15 cases good results, 16 cases acceptable results, and 12 cases poor results. The predictive logistic regression equation was as follows: Log 1-p/p = 3.12-1.52 x Fraser type - 1.65 x severity of soft-tissue injury in the tibia - 1.31 x fixation time after injury in the tibia - 0.821 x AO fracture grade in the tibia + 1.025 x fixation time after injury in the femur - 0.687 x AO fracture grade in the femur ( p=0.01). Among the variables, Fraser type and the severity of soft-tissue injury in the tibia were significantly related to the final result. The multivariate analysis showed that both the involvement of the knee joint and

  4. Multivariate methods for analysis of environmental reference materials using laser-induced breakdown spectroscopy

    Directory of Open Access Journals (Sweden)

    Shikha Awasthi

    2017-06-01

    Full Text Available Analysis of emission from laser-induced plasma has a unique capability for quantifying the major and minor elements present in any type of samples under optimal analysis conditions. Chemometric techniques are very effective and reliable tools for quantification of multiple components in complex matrices. The feasibility of laser-induced breakdown spectroscopy (LIBS in combination with multivariate analysis was investigated for the analysis of environmental reference materials (RMs. In the present work, different (Certified/Standard Reference Materials of soil and plant origin were analyzed using LIBS and the presence of Al, Ca, Mg, Fe, K, Mn and Si were identified in the LIBS spectra of these materials. Multivariate statistical methods (Partial Least Square Regression and Partial Least Square Discriminant Analysis were employed for quantitative analysis of the constituent elements using the LIBS spectral data. Calibration models were used to predict the concentrations of the different elements of test samples and subsequently, the concentrations were compared with certified concentrations to check the authenticity of models. The non-destructive analytical method namely Instrumental Neutron Activation Analysis (INAA using high flux reactor neutrons and high resolution gamma-ray spectrometry was also used for intercomparison of results of two RMs by LIBS.

  5. Determination of sulfamethoxazole and trimethoprim mixtures by multivariate electronic spectroscopy

    OpenAIRE

    Cordeiro, Gilcélia A.; Peralta-Zamora, Patricio; Nagata, Noemi; Pontarollo, Roberto

    2008-01-01

    In this work a multivariate spectroscopic methodology is proposed for quantitative determination of sulfamethoxazole and trimethoprim in pharmaceutical associations. The multivariate model was developed by partial least-squares regression, using twenty synthetic mixtures and the spectral region between 190 and 350 nm. In the validation stage, which involved the analysis of five synthetic mixtures, prediction errors lower that 3% were observed. The predictive capacity of the multivariate model...

  6. Principal component regression analysis with SPSS.

    Science.gov (United States)

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  7. Simultaneous chemometric determination of pyridoxine hydrochloride and isoniazid in tablets by multivariate regression methods.

    Science.gov (United States)

    Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru

    2010-08-01

    The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.

  8. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  9. The effect of hospital mergers on long-term sickness absence among hospital employees: a fixed effects multivariate regression analysis using panel data.

    Science.gov (United States)

    Kjekshus, Lars Erik; Bernstrøm, Vilde Hoff; Dahl, Espen; Lorentzen, Thomas

    2014-02-03

    Hospitals are merging to become more cost-effective. Mergers are often complex and difficult processes with variable outcomes. The aim of this study was to analyze the effect of mergers on long-term sickness absence among hospital employees. Long-term sickness absence was analyzed among hospital employees (N = 107 209) in 57 hospitals involved in 23 mergers in Norway between 2000 and 2009. Variation in long-term sickness absence was explained through a fixed effects multivariate regression analysis using panel data with years-since-merger as the independent variable. We found a significant but modest effect of mergers on long-term sickness absence in the year of the merger, and in years 2, 3 and 4; analyzed by gender there was a significant effect for women, also for these years, but only in year 4 for men. However, men are less represented among the hospital workforce; this could explain the lack of significance. Mergers has a significant effect on employee health that should be taken into consideration when deciding to merge hospitals. This study illustrates the importance of analyzing the effects of mergers over several years and the need for more detailed analyses of merger processes and of the changes that may occur as a result of such mergers.

  10. Logistic regression analysis of factors associated with avascular necrosis of the femoral head following femoral neck fractures in middle-aged and elderly patients.

    Science.gov (United States)

    Ai, Zi-Sheng; Gao, You-Shui; Sun, Yuan; Liu, Yue; Zhang, Chang-Qing; Jiang, Cheng-Hua

    2013-03-01

    Risk factors for femoral neck fracture-induced avascular necrosis of the femoral head have not been elucidated clearly in middle-aged and elderly patients. Moreover, the high incidence of screw removal in China and its effect on the fate of the involved femoral head require statistical methods to reflect their intrinsic relationship. Ninety-nine patients older than 45 years with femoral neck fracture were treated by internal fixation between May 1999 and April 2004. Descriptive analysis, interaction analysis between associated factors, single factor logistic regression, multivariate logistic regression, and detailed interaction analysis were employed to explore potential relationships among associated factors. Avascular necrosis of the femoral head was found in 15 cases (15.2 %). Age × the status of implants (removal vs. maintenance) and gender × the timing of reduction were interactive according to two-factor interactive analysis. Age, the displacement of fractures, the quality of reduction, and the status of implants were found to be significant factors in single factor logistic regression analysis. Age, age × the status of implants, and the quality of reduction were found to be significant factors in multivariate logistic regression analysis. In fine interaction analysis after multivariate logistic regression analysis, implant removal was the most important risk factor for avascular necrosis in 56-to-85-year-old patients, with a risk ratio of 26.00 (95 % CI = 3.076-219.747). The middle-aged and elderly have less incidence of avascular necrosis of the femoral head following femoral neck fractures treated by cannulated screws. The removal of cannulated screws can induce a significantly high incidence of avascular necrosis of the femoral head in elderly patients, while a high-quality reduction is helpful to reduce avascular necrosis.

  11. Reporting quality of multivariable logistic regression in selected Indian medical journals.

    Science.gov (United States)

    Kumar, R; Indrayan, A; Chhabra, P

    2012-01-01

    Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.

  12. Multivariate Multiple Regression Models for a Big Data-Empowered SON Framework in Mobile Wireless Networks

    Directory of Open Access Journals (Sweden)

    Yoonsu Shin

    2016-01-01

    Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.

  13. Multivariate analysis of 2-DE protein patterns - Practical approaches

    DEFF Research Database (Denmark)

    Jacobsen, Charlotte; Jacobsen, Susanne; Grove, H.

    2007-01-01

    Practical approaches to the use of multivariate data analysis of 2-DE protein patterns are demonstrated by three independent strategies for the image analysis and the multivariate analysis on the same set of 2-DE data. Four wheat varieties were selected on the basis of their baking quality. Two...... of the varieties were of strong baking quality and hard wheat kernel and two were of weak baking quality and soft kernel. Gliadins at different stages of grain development were analyzed by the application of multivariate data analysis on images of 2-DEs. Patterns related to the wheat varieties, harvest times...

  14. Use of multivariate analysis to research career advancement of academic librarians

    Directory of Open Access Journals (Sweden)

    Filiberto Felipe Martínez Arellano

    2004-01-01

    Full Text Available Diverse variables dealing with credential factors, bureaucratiuc factors, organizational and disciplinary achievements, academic culture factors, social ascribed factors, and institutional factors were stated as explanatory elements of promotion, tenure status, and earnings. A survey was the research instrument for collecting data to test diverse variables dealing with academic librarians rewards and earnings. Since the study attempted to analyze variables in a multivariate context, variable interactions were tested using multiple regression analysis. Findings of this study contribute to a better understanding of those factors influencing career advancement of academic librarians. Likewise, research methodology of this study could be used in Library and Information Science(LIS research.

  15. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs.

    Science.gov (United States)

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin

    2013-10-15

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.

  16. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol lowering drugs

    Science.gov (United States)

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin

    2013-01-01

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436

  17. Diagnostic accuracy of atypical p-ANCA in autoimmune hepatitis using ROC- and multivariate regression analysis.

    Science.gov (United States)

    Terjung, B; Bogsch, F; Klein, R; Söhne, J; Reichel, C; Wasmuth, J-C; Beuers, U; Sauerbruch, T; Spengler, U

    2004-09-29

    Antineutrophil cytoplasmic antibodies (atypical p-ANCA) are detected at high prevalence in sera from patients with autoimmune hepatitis (AIH), but their diagnostic relevance for AIH has not been systematically evaluated so far. Here, we studied sera from 357 patients with autoimmune (autoimmune hepatitis n=175, primary sclerosing cholangitis (PSC) n=35, primary biliary cirrhosis n=45), non-autoimmune chronic liver disease (alcoholic liver cirrhosis n=62; chronic hepatitis C virus infection (HCV) n=21) or healthy controls (n=19) for the presence of various non-organ specific autoantibodies. Atypical p-ANCA, antinuclear antibodies (ANA), antibodies against smooth muscles (SMA), antibodies against liver/kidney microsomes (anti-Lkm1) and antimitochondrial antibodies (AMA) were detected by indirect immunofluorescence microscopy, antibodies against the M2 antigen (anti-M2), antibodies against soluble liver antigen (anti-SLA/LP) and anti-Lkm1 by using enzyme linked immunosorbent assays. To define the diagnostic precision of the autoantibodies, results of autoantibody testing were analyzed by receiver operating characteristics (ROC) and forward conditional logistic regression analysis. Atypical p-ANCA were detected at high prevalence in sera from patients with AIH (81%) and PSC (94%). ROC- and logistic regression analysis revealed atypical p-ANCA and SMA, but not ANA as significant diagnostic seromarkers for AIH (atypical p-ANCA: AUC 0.754+/-0.026, odds ratio [OR] 3.4; SMA: 0.652+/-0.028, OR 4.1). Atypical p-ANCA also emerged as the only diagnostically relevant seromarker for PSC (AUC 0.690+/-0.04, OR 3.4). None of the tested antibodies yielded a significant diagnostic accuracy for patients with alcoholic liver cirrhosis, HCV or healthy controls. Atypical p-ANCA along with SMA represent a seromarker with high diagnostic accuracy for AIH and should be explicitly considered in a revised version of the diagnostic score for AIH.

  18. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  19. A Novel and Effective Multivariate Method for Compositional Analysis using Laser Induced Breakdown Spectroscopy

    International Nuclear Information System (INIS)

    Wang, W; Qi, H; Ayhan, B; Kwan, C; Vance, S

    2014-01-01

    Compositional analysis is important to interrogate spectral samples for direct analysis of materials in agriculture, environment and archaeology, etc. In this paper, multi-variate analysis (MVA) techniques are coupled with laser induced breakdown spectroscopy (LIBS) to estimate quantitative elemental compositions and determine the type of the sample. In particular, we present a new multivariate analysis method for composition analysis, referred to as s pectral unmixing . The LIBS spectrum of a testing sample is considered as a linear mixture with more than one constituent signatures that correspond to various chemical elements. The signature library is derived from regression analysis using training samples or is manually set up with the information from an elemental LIBS spectral database. A calibration step is used to make all the signatures in library to be homogeneous with the testing sample so as to avoid inhomogeneous signatures that might be caused by different sampling conditions. To demonstrate the feasibility of the proposed method, we compare it with the traditional partial least squares (PLS) method and the univariate method using a standard soil data set with elemental concentration measured a priori. The experimental results show that the proposed method holds great potential for reliable and effective elemental concentration estimation

  20. Multivariate Analysis and Machine Learning in Cerebral Palsy Research.

    Science.gov (United States)

    Zhang, Jing

    2017-01-01

    Cerebral palsy (CP), a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML) approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.

  1. Data analysis and approximate models model choice, location-scale, analysis of variance, nonparametric regression and image analysis

    CERN Document Server

    Davies, Patrick Laurie

    2014-01-01

    Introduction IntroductionApproximate Models Notation Two Modes of Statistical AnalysisTowards One Mode of Analysis Approximation, Randomness, Chaos, Determinism ApproximationA Concept of Approximation Approximation Approximating a Data Set by a Model Approximation Regions Functionals and EquivarianceRegularization and Optimality Metrics and DiscrepanciesStrong and Weak Topologies On Being (almost) Honest Simulations and Tables Degree of Approximation and p-values ScalesStability of Analysis The Choice of En(α, P) Independence Procedures, Approximation and VaguenessDiscrete Models The Empirical Density Metrics and Discrepancies The Total Variation Metric The Kullback-Leibler and Chi-Squared Discrepancies The Po(λ) ModelThe b(k, p) and nb(k, p) Models The Flying Bomb Data The Student Study Times Data OutliersOutliers, Data Analysis and Models Breakdown Points and Equivariance Identifying Outliers and Breakdown Outliers in Multivariate Data Outliers in Linear Regression Outliers in Structured Data The Location...

  2. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  3. Multivariate statistical analysis a high-dimensional approach

    CERN Document Server

    Serdobolskii, V

    2000-01-01

    In the last few decades the accumulation of large amounts of in­ formation in numerous applications. has stimtllated an increased in­ terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de­ ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat­ ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari­ ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen­ ...

  4. Multivariate Analysis and Machine Learning in Cerebral Palsy Research

    Directory of Open Access Journals (Sweden)

    Jing Zhang

    2017-12-01

    Full Text Available Cerebral palsy (CP, a common pediatric movement disorder, causes the most severe physical disability in children. Early diagnosis in high-risk infants is critical for early intervention and possible early recovery. In recent years, multivariate analytic and machine learning (ML approaches have been increasingly used in CP research. This paper aims to identify such multivariate studies and provide an overview of this relatively young field. Studies reviewed in this paper have demonstrated that multivariate analytic methods are useful in identification of risk factors, detection of CP, movement assessment for CP prediction, and outcome assessment, and ML approaches have made it possible to automatically identify movement impairments in high-risk infants. In addition, outcome predictors for surgical treatments have been identified by multivariate outcome studies. To make the multivariate and ML approaches useful in clinical settings, further research with large samples is needed to verify and improve these multivariate methods in risk factor identification, CP detection, movement assessment, and outcome evaluation or prediction. As multivariate analysis, ML and data processing technologies advance in the era of Big Data of this century, it is expected that multivariate analysis and ML will play a bigger role in improving the diagnosis and treatment of CP to reduce mortality and morbidity rates, and enhance patient care for children with CP.

  5. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    Science.gov (United States)

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  6. Regression Analysis by Example. 5th Edition

    Science.gov (United States)

    Chatterjee, Samprit; Hadi, Ali S.

    2012-01-01

    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  7. Effects of Video Games and Online Chat on Mathematics Performance in High School: An Approach of Multivariate Data Analysis

    OpenAIRE

    Lina Wu; Wenyi Lu; Ye Li

    2016-01-01

    Regarding heavy video game players for boys and super online chat lovers for girls as a symbolic phrase in the current adolescent culture, this project of data analysis verifies the displacement effect on deteriorating mathematics performance. To evaluate correlation or regression coefficients between a factor of playing video games or chatting online and mathematics performance compared with other factors, we use multivariate analysis technique and take gender difference into account. We fin...

  8. Determinants of orphan drugs prices in France: a regression analysis.

    Science.gov (United States)

    Korchagina, Daria; Millier, Aurelie; Vataire, Anne-Lise; Aballea, Samuel; Falissard, Bruno; Toumi, Mondher

    2017-04-21

    The introduction of the orphan drug legislation led to the increase in the number of available orphan drugs, but the access to them is often limited due to the high price. Social preferences regarding funding orphan drugs as well as the criteria taken into consideration while setting the price remain unclear. The study aimed at identifying the determinant of orphan drug prices in France using a regression analysis. All drugs with a valid orphan designation at the moment of launch for which the price was available in France were included in the analysis. The selection of covariates was based on a literature review and included drug characteristics (Anatomical Therapeutic Chemical (ATC) class, treatment line, age of target population), diseases characteristics (severity, prevalence, availability of alternative therapeutic options), health technology assessment (HTA) details (actual benefit (AB) and improvement in actual benefit (IAB) scores, delay between the HTA and commercialisation), and study characteristics (type of study, comparator, type of endpoint). The main data sources were European public assessment reports, HTA reports, summaries of opinion on orphan designation of the European Medicines Agency, and the French insurance database of drugs and tariffs. A generalized regression model was developed to test the association between the annual treatment cost and selected covariates. A total of 68 drugs were included. The mean annual treatment cost was €96,518. In the univariate analysis, the ATC class (p = 0.01), availability of alternative treatment options (p = 0.02) and the prevalence (p = 0.02) showed a significant correlation with the annual cost. The multivariate analysis demonstrated significant association between the annual cost and availability of alternative treatment options, ATC class, IAB score, type of comparator in the pivotal clinical trial, as well as commercialisation date and delay between the HTA and commercialisation. The

  9. Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach

    Science.gov (United States)

    Michael S. Balshi; A. David McGuire; Paul Duffy; Mike Flannigan; John Walsh; Jerry Melillo

    2009-01-01

    We developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5o (latitude x longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was...

  10. A new approach to nuclear reactor design optimization using genetic algorithms and regression analysis

    International Nuclear Information System (INIS)

    Kumar, Akansha; Tsvetkov, Pavel V.

    2015-01-01

    Highlights: • This paper presents a new method useful for the optimization of complex dynamic systems. • The method uses the strengths of; genetic algorithms (GA), and regression splines. • The method is applied to the design of a gas cooled fast breeder reactor design. • Tools like Java, R, and codes like MCNP, Matlab are used in this research. - Abstract: A module based optimization method using genetic algorithms (GA), and multivariate regression analysis has been developed to optimize a set of parameters in the design of a nuclear reactor. GA simulates natural evolution to perform optimization, and is widely used in recent times by the scientific community. The GA fits a population of random solutions to the optimized solution of a specific problem. In this work, we have developed a genetic algorithm to determine the values for a set of nuclear reactor parameters to design a gas cooled fast breeder reactor core including a basis thermal–hydraulics analysis, and energy transfer. Multivariate regression is implemented using regression splines (RS). Reactor designs are usually complex and a simulation needs a significantly large amount of time to execute, hence the implementation of GA or any other global optimization techniques is not feasible, therefore we present a new method of using RS in conjunction with GA. Due to using RS, we do not necessarily need to run the neutronics simulation for all the inputs generated from the GA module rather, run the simulations for a predefined set of inputs, build a multivariate regression fit to the input and the output parameters, and then use this fit to predict the output parameters for the inputs generated by GA. The reactor parameters are given by the, radius of a fuel pin cell, isotopic enrichment of the fissile material in the fuel, mass flow rate of the coolant, and temperature of the coolant at the core inlet. And, the optimization objectives for the reactor core are, high breeding of U-233 and Pu-239 in

  11. A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)

    2012-02-01

    New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.

  12. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  13. Processing data collected from radiometric experiments by multivariate technique

    International Nuclear Information System (INIS)

    Urbanski, P.; Kowalska, E.; Machaj, B.; Jakowiuk, A.

    2005-01-01

    Multivariate techniques applied for processing data collected from radiometric experiments can provide more efficient extraction of the information contained in the spectra. Several techniques are considered: (i) multivariate calibration using Partial Least Square Regression and Artificial Neural Network, (ii) standardization of the spectra, (iii) smoothing of collected spectra were autocorrelation function and bootstrap were used for the assessment of the processed data, (iv) image processing using Principal Component Analysis. Application of these techniques is illustrated on examples of some industrial applications. (author)

  14. Multivariate Analysis of Schools and Educational Policy.

    Science.gov (United States)

    Kiesling, Herbert J.

    This report describes a multivariate analysis technique that approaches the problems of educational production function analysis by (1) using comparable measures of output across large experiments, (2) accounting systematically for differences in socioeconomic background, and (3) treating the school as a complete system in which different…

  15. Gaussian process regression analysis for functional data

    CERN Document Server

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  16. Forecasting the daily power output of a grid-connected photovoltaic system based on multivariate adaptive regression splines

    International Nuclear Information System (INIS)

    Li, Yanting; He, Yong; Su, Yan; Shu, Lianjie

    2016-01-01

    Highlights: • Suggests a nonparametric model based on MARS for output power prediction. • Compare the MARS model with a wide variety of prediction models. • Show that the MARS model is able to provide an overall good performance in both the training and testing stages. - Abstract: Both linear and nonlinear models have been proposed for forecasting the power output of photovoltaic systems. Linear models are simple to implement but less flexible. Due to the stochastic nature of the power output of PV systems, nonlinear models tend to provide better forecast than linear models. Motivated by this, this paper suggests a fairly simple nonlinear regression model known as multivariate adaptive regression splines (MARS), as an alternative to forecasting of solar power output. The MARS model is a data-driven modeling approach without any assumption about the relationship between the power output and predictors. It maintains simplicity of the classical multiple linear regression (MLR) model while possessing the capability of handling nonlinearity. It is simpler in format than other nonlinear models such as ANN, k-nearest neighbors (KNN), classification and regression tree (CART), and support vector machine (SVM). The MARS model was applied on the daily output of a grid-connected 2.1 kW PV system to provide the 1-day-ahead mean daily forecast of the power output. The comparisons with a wide variety of forecast models show that the MARS model is able to provide reliable forecast performance.

  17. Exploring factors associated with traumatic dental injuries in preschool children: a Poisson regression analysis.

    Science.gov (United States)

    Feldens, Carlos Alberto; Kramer, Paulo Floriani; Ferreira, Simone Helena; Spiguel, Mônica Hermann; Marquezan, Marcela

    2010-04-01

    This cross-sectional study aimed to investigate the factors associated with dental trauma in preschool children using Poisson regression analysis with robust variance. The study population comprised 888 children aged 3- to 5-year-old attending public nurseries in Canoas, southern Brazil. Questionnaires assessing information related to the independent variables (age, gender, race, mother's educational level and family income) were completed by the parents. Clinical examinations were carried out by five trained examiners in order to assess traumatic dental injuries (TDI) according to Andreasen's classification. One of the five examiners was calibrated to assess orthodontic characteristics (open bite and overjet). Multivariable Poisson regression analysis with robust variance was used to determine the factors associated with dental trauma as well as the strengths of association. Traditional logistic regression was also performed in order to compare the estimates obtained by both methods of statistical analysis. 36.4% (323/888) of the children suffered dental trauma and there was no difference in prevalence rates from 3 to 5 years of age. Poisson regression analysis showed that the probability of the outcome was almost 30% higher for children whose mothers had more than 8 years of education (Prevalence Ratio = 1.28; 95% CI = 1.03-1.60) and 63% higher for children with an overjet greater than 2 mm (Prevalence Ratio = 1.63; 95% CI = 1.31-2.03). Odds ratios clearly overestimated the size of the effect when compared with prevalence ratios. These findings indicate the need for preventive orientation regarding TDI, in order to educate parents and caregivers about supervising infants, particularly those with increased overjet and whose mothers have a higher level of education. Poisson regression with robust variance represents a better alternative than logistic regression to estimate the risk of dental trauma in preschool children.

  18. Late rectal toxicity after conformal radiotherapy of prostate cancer (I): multivariate analysis and dose-response

    International Nuclear Information System (INIS)

    Skwarchuk, Mark W.; Jackson, Andrew; Zelefsky, Michael J.; Venkatraman, Ennapadam S.; Cowen, Didier M.; Levegruen, Sabine; Burman, Chandra M.; Fuks, Zvi; Leibel, Steven A.; Ling, C. Clifton

    2000-01-01

    Purpose: The purpose of this paper is to use the outcome of a dose escalation protocol for three-dimensional conformal radiation therapy (3D-CRT) of prostate cancer to study the dose-response for late rectal toxicity and to identify anatomic, dosimetric, and clinical factors that correlate with late rectal bleeding in multivariate analysis. Methods and Materials: Seven hundred forty-three patients with T1c-T3 prostate cancer were treated with 3D-CRT with prescribed doses of 64.8 to 81.0 Gy. The 5-year actuarial rate of late rectal toxicity was assessed using Kaplan-Meier statistics. A retrospective dosimetric analysis was performed for patients treated to 70.2 Gy (52 patients) or 75.6 Gy (119 patients) who either exhibited late rectal bleeding (RTOG Grade 2/3) within 30 months after treatment (i.e., 70.2 Gy--13 patients, 75.6 Gy--36 patients) or were nonbleeding for at least 30 months (i.e., 70.2 Gy--39 patients, 75.6 Gy--83 patients). Univariate and multivariate logistic regression was performed to correlate late rectal bleeding with several anatomic, dosimetric, and clinical variables. Results: A dose response for ≥ Grade 2 late rectal toxicity was observed. By multivariate analysis, the following factors were significantly correlated with ≥ Grade 2 late rectal bleeding for patients prescribed 70.2 Gy: 1) enclosure of the outer rectal contour by the 50% isodose on the isocenter slice (i.e., Iso50) (p max (p max

  19. A Matlab program for stepwise regression

    Directory of Open Access Journals (Sweden)

    Yanhong Qi

    2016-03-01

    Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.

  20. Local Strategy Combined with a Wavelength Selection Method for Multivariate Calibration

    Directory of Open Access Journals (Sweden)

    Haitao Chang

    2016-06-01

    Full Text Available One of the essential factors influencing the prediction accuracy of multivariate calibration models is the quality of the calibration data. A local regression strategy, together with a wavelength selection approach, is proposed to build the multivariate calibration models based on partial least squares regression. The local algorithm is applied to create a calibration set of spectra similar to the spectrum of an unknown sample; the synthetic degree of grey relation coefficient is used to evaluate the similarity. A wavelength selection method based on simple-to-use interactive self-modeling mixture analysis minimizes the influence of noisy variables, and the most informative variables of the most similar samples are selected to build the multivariate calibration model based on partial least squares regression. To validate the performance of the proposed method, ultraviolet-visible absorbance spectra of mixed solutions of food coloring analytes in a concentration range of 20–200 µg/mL is measured. Experimental results show that the proposed method can not only enhance the prediction accuracy of the calibration model, but also greatly reduce its complexity.

  1. Multivariate Regression of Liver on Intestine of Mice: A ...

    African Journals Online (AJOL)

    FIRST LADY

    pairs recovered. Linear, semi-logarithmic and logarithmic-logarithmic (log- log) regressions were performed. He chose the log-log curves because its variance was more uniform. The statistical comparison of .... E(U1| U2 = u2) is the regression function of U1 on U2, and Var (U1|U2 = u2) is the conditional covariance matrix.

  2. Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models.

    Science.gov (United States)

    Lehermeier, Christina; Schön, Chris-Carolin; de Los Campos, Gustavo

    2015-09-01

    Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to "correct" for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP. Copyright © 2015 by the Genetics Society of America.

  3. Multivariate Analysis for the Processing of Signals

    Directory of Open Access Journals (Sweden)

    Beattie J.R.

    2014-01-01

    Full Text Available Real-world experiments are becoming increasingly more complex, needing techniques capable of tracking this complexity. Signal based measurements are often used to capture this complexity, where a signal is a record of a sample’s response to a parameter (e.g. time, displacement, voltage, wavelength that is varied over a range of values. In signals the responses at each value of the varied parameter are related to each other, depending on the composition or state sample being measured. Since signals contain multiple information points, they have rich information content but are generally complex to comprehend. Multivariate Analysis (MA has profoundly transformed their analysis by allowing gross simplification of the tangled web of variation. In addition MA has also provided the advantage of being much more robust to the influence of noise than univariate methods of analysis. In recent years, there has been a growing awareness that the nature of the multivariate methods allows exploitation of its benefits for purposes other than data analysis, such as pre-processing of signals with the aim of eliminating irrelevant variations prior to analysis of the signal of interest. It has been shown that exploiting multivariate data reduction in an appropriate way can allow high fidelity denoising (removal of irreproducible non-signals, consistent and reproducible noise-insensitive correction of baseline distortions (removal of reproducible non-signals, accurate elimination of interfering signals (removal of reproducible but unwanted signals and the standardisation of signal amplitude fluctuations. At present, the field is relatively small but the possibilities for much wider application are considerable. Where signal properties are suitable for MA (such as the signal being stationary along the x-axis, these signal based corrections have the potential to be highly reproducible, and highly adaptable and are applicable in situations where the data is noisy or

  4. A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

    Science.gov (United States)

    Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

    2016-01-13

    A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.

  5. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  6. PIXE-quantified AXSIA: Elemental mapping by multivariate spectral analysis

    International Nuclear Information System (INIS)

    Doyle, B.L.; Provencio, P.P.; Kotula, P.G.; Antolak, A.J.; Ryan, C.G.; Campbell, J.L.; Barrett, K.

    2006-01-01

    Automated, nonbiased, multivariate statistical analysis techniques are useful for converting very large amounts of data into a smaller, more manageable number of chemical components (spectra and images) that are needed to describe the measurement. We report the first use of the multivariate spectral analysis program AXSIA (Automated eXpert Spectral Image Analysis) developed at Sandia National Laboratories to quantitatively analyze micro-PIXE data maps. AXSIA implements a multivariate curve resolution technique that reduces the spectral image data sets into a limited number of physically realizable and easily interpretable components (including both spectra and images). We show that the principal component spectra can be further analyzed using conventional PIXE programs to convert the weighting images into quantitative concentration maps. A common elemental data set has been analyzed using three different PIXE analysis codes and the results compared to the cases when each of these codes is used to separately analyze the associated AXSIA principal component spectral data. We find that these comparisons are in good quantitative agreement with each other

  7. Regression Analysis and the Sociological Imagination

    Science.gov (United States)

    De Maio, Fernando

    2014-01-01

    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  8. The analysis of multivariate group differences using common principal components

    NARCIS (Netherlands)

    Bechger, T.M.; Blanca, M.J.; Maris, G.

    2014-01-01

    Although it is simple to determine whether multivariate group differences are statistically significant or not, such differences are often difficult to interpret. This article is about common principal components analysis as a tool for the exploratory investigation of multivariate group differences

  9. Polylinear regression analysis in radiochemistry

    International Nuclear Information System (INIS)

    Kopyrin, A.A.; Terent'eva, T.N.; Khramov, N.N.

    1995-01-01

    A number of radiochemical problems have been formulated in the framework of polylinear regression analysis, which permits the use of conventional mathematical methods for their solution. The authors have considered features of the use of polylinear regression analysis for estimating the contributions of various sources to the atmospheric pollution, for studying irradiated nuclear fuel, for estimating concentrations from spectral data, for measuring neutron fields of a nuclear reactor, for estimating crystal lattice parameters from X-ray diffraction patterns, for interpreting data of X-ray fluorescence analysis, for estimating complex formation constants, and for analyzing results of radiometric measurements. The problem of estimating the target parameters can be incorrect at certain properties of the system under study. The authors showed the possibility of regularization by adding a fictitious set of data open-quotes obtainedclose quotes from the orthogonal design. To estimate only a part of the parameters under consideration, the authors used incomplete rank models. In this case, it is necessary to take into account the possibility of confounding estimates. An algorithm for evaluating the degree of confounding is presented which is realized using standard software or regression analysis

  10. Particulate characterization by PIXE multivariate spectral analysis

    International Nuclear Information System (INIS)

    Antolak, Arlyn J.; Morse, Daniel H.; Grant, Patrick G.; Kotula, Paul G.; Doyle, Barney L.; Richardson, Charles B.

    2007-01-01

    Obtaining particulate compositional maps from scanned PIXE (proton-induced X-ray emission) measurements is extremely difficult due to the complexity of analyzing spectroscopic data collected with low signal-to-noise at each scan point (pixel). Multivariate spectral analysis has the potential to analyze such data sets by reducing the PIXE data to a limited number of physically realizable and easily interpretable components (that include both spectral and image information). We have adapted the AXSIA (automated expert spectral image analysis) program, originally developed by Sandia National Laboratories to quantify electron-excited X-ray spectroscopy data, for this purpose. Samples consisting of particulates with known compositions and sizes were loaded onto Mylar and paper filter substrates and analyzed by scanned micro-PIXE. The data sets were processed by AXSIA and the associated principal component spectral data were quantified by converting the weighting images into concentration maps. The results indicate automated, nonbiased, multivariate statistical analysis is useful for converting very large amounts of data into a smaller, more manageable number of compositional components needed for locating individual particles-of-interest on large area collection media

  11. Sugar and acid content of Citrus prediction modeling using FT-IR fingerprinting in combination with multivariate statistical analysis.

    Science.gov (United States)

    Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung

    2016-01-01

    A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Determination of wheat quality by mass spectrometry and multivariate data analysis

    DEFF Research Database (Denmark)

    Gottlieb, D.M.; Schultz, J.; Petersen, M.

    2002-01-01

    Multivariate analysis has been applied as support to proteome analysis in order to implement an easier and faster way of data handling based on separation by matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry. The characterisation phase in proteome analysis by means...... of simple visual inspection is a demanding process and also insecure because subjectivity is the controlling element. Multivariate analysis offers, to a considerable extent, objectivity and must therefore be regarded as a neutral way to evaluate results obtained by proteome analysis.Proteome analysis...

  13. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  14. Searching for New Biomarkers and the Use of Multivariate Analysis in Gastric Cancer Diagnostics.

    Science.gov (United States)

    Kucera, Radek; Smid, David; Topolcan, Ondrej; Karlikova, Marie; Fiala, Ondrej; Slouka, David; Skalicky, Tomas; Treska, Vladislav; Kulda, Vlastimil; Simanek, Vaclav; Safanda, Martin; Pesta, Martin

    2016-04-01

    The first aim of this study was to search for new biomarkers to be used in gastric cancer diagnostics. The second aim was to verify the findings presented in literature on a sample of the local population and investigate the risk of gastric cancer in that population using a multivariant statistical analysis. We assessed a group of 36 patients with gastric cancer and 69 healthy individuals. We determined carcinoembryonic antigen, cancer antigen 19-9, cancer antigen 72-4, matrix metalloproteinases (-1, -2, -7, -8 and -9), osteoprotegerin, osteopontin, prothrombin induced by vitamin K absence-II, pepsinogen I, pepsinogen II, gastrin and Helicobacter pylori for each sample. The multivariate stepwise logistic regression identified the following biomarkers as the best gastric cancer predictors: CEA, CA72-4, pepsinogen I, Helicobacter pylori presence and MMP7. CEA and CA72-4 remain the best markers for gastric cancer diagnostics. We suggest a mathematical model for the assessment of risk of gastric cancer. Copyright© 2016 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  15. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    Science.gov (United States)

    Cannon, Alex

    2017-04-01

    univariate technique, and cannot incorporate information from additional covariates, for example ENSO state or physiographic controls on extreme rainfall within a region. Here, the univariate MQR model is extended to allow the use of multiple covariates. Multivariate monotone quantile regression (MMQR) is based on a single hidden-layer feedforward network with the quantile regression error function and partial monotonicity constraints. The MMQR model is demonstrated via Monte Carlo simulations and the estimation and visualization of regional trends in moderate rainfall extremes based on homogenized sub-daily precipitation data at stations in Canada.

  16. Multivariate Analysis of Industrial Scale Fermentation Data

    DEFF Research Database (Denmark)

    Mears, Lisa; Nørregård, Rasmus; Stocks, Stuart M.

    2015-01-01

    Multivariate analysis allows process understanding to be gained from the vast and complex datasets recorded from fermentation processes, however the application of such techniques to this field can be limited by the data pre-processing requirements and data handling. In this work many iterations...

  17. Multivariate spectral-analysis of movement-related EEG data

    International Nuclear Information System (INIS)

    Andrew, C. M.

    1997-01-01

    The univariate method of event-related desynchronization (ERD) analysis, which quantifies the temporal evolution of power within specific frequency bands from electroencephalographic (EEG) data recorded during a task or event, is extended to an event related multivariate spectral analysis method. With this method, time courses of cross-spectra, phase spectra, coherence spectra, band-averaged coherence values (event-related coherence, ERCoh), partial power spectra and partial coherence spectra are estimated from an ensemble of multivariate event-related EEG trials. This provides a means of investigating relationships between EEG signals recorded over different scalp areas during the performance of a task or the occurrence of an event. The multivariate spectral analysis method is applied to EEG data recorded during three different movement-related studies involving discrete right index finger movements. The first study investigates the impact of the EEG derivation type on the temporal evolution of interhemispheric coherence between activity recorded at electrodes overlying the left and right sensorimotor hand areas during cued finger movement. The question results whether changes in coherence necessarily reflect changes in functional coupling of the cortical structures underlying the recording electrodes. The method is applied to data recorded during voluntary finger movement and a hypothesis, based on an existing global/local model of neocortical dynamics, is formulated to explain the coherence results. The third study applies partial spectral analysis too, and investigates phase relationships of, movement-related data recorded from a full head montage, thereby providing further results strengthening the global/local hypothesis. (author)

  18. Predictors of unsuccessful outcome in cemented femoral revisions using bone impaction grafting; Cox regression analysis of 208 cases.

    Science.gov (United States)

    Te Stroet, Martijn A J; Rijnen, Wim H C; Gardeniers, Jean W M; Schreurs, B Willem; Hannink, Gerjon

    2016-09-29

    Despite improvements in the technique of femoral impaction bone grafting, reconstruction failures still can occur. Therefore, the aim of our study was to determine risk factors for the endpoint re-revision for any reason. We used prospectively collected demographic, clinical and surgical data of all 202 patients who underwent 208 femoral revisions using the X-change Femoral Revision System (Stryker-Howmedica), fresh-frozen morcellised allograft and a cemented polished Exeter stem in our department from 1991 to 2007. Univariable and multivariable Cox regression analyses were performed to identify potential factors associated with re-revision. The mean follow-up was 10.6 (5-21) years. The cumulative re-revision rate was 6.3% (13/208). After univariable selection, sex, age, body mass index (BMI), American Association of Anesthesiologists (ASA) classification, type of removed femoral component, and mesh used for reconstruction were included in multivariable regression analysis.In the multivariable analysis, BMI was the only factor that was significantly associated with the risk of re-revision after bone impaction grafting (BMI ≥30 vs. BMI <30, HR = 6.54 [95% CI 1.89-22.65]; p = 0.003). BMI was the only factor associated with the risk of re-revision for any reason. Besides BMI also other factors, such as Endoklinik score and the type of removed femoral component, can provide guidance in the process of preclinical decision making. With the knowledge obtained from this study, preoperative patient selection, informed consent, and treatment protocols can be better adjusted to the individual patient who needs to undergo a femoral revision with impaction bone grafting.

  19. Essentials of multivariate data analysis

    CERN Document Server

    Spencer, Neil H

    2013-01-01

    ""… this text provides an overview at an introductory level of several methods in multivariate data analysis. It contains in-depth examples from one data set woven throughout the text, and a free [Excel] Add-In to perform the analyses in Excel, with step-by-step instructions provided for each technique. … could be used as a text (possibly supplemental) for courses in other fields where researchers wish to apply these methods without delving too deeply into the underlying statistics.""-The American Statistician, February 2015

  20. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  1. Preface to Berk's "Regression Analysis: A Constructive Critique"

    OpenAIRE

    de Leeuw, Jan

    2003-01-01

    It is pleasure to write a preface for the book ”Regression Analysis” of my fellow series editor Dick Berk. And it is a pleasure in particular because the book is about regression analysis, the most popular and the most fundamental technique in applied statistics. And because it is critical of the way regression analysis is used in the sciences, in particular in the social and behavioral sciences. Although the book can be read as an introduction to regression analysis, it can also be read as a...

  2. Multicollinearity and Regression Analysis

    Science.gov (United States)

    Daoud, Jamal I.

    2017-12-01

    In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.

  3. Multivariate Meta-Analysis Using Individual Participant Data

    Science.gov (United States)

    Riley, R. D.; Price, M. J.; Jackson, D.; Wardle, M.; Gueyffier, F.; Wang, J.; Staessen, J. A.; White, I. R.

    2015-01-01

    When combining results across related studies, a multivariate meta-analysis allows the joint synthesis of correlated effect estimates from multiple outcomes. Joint synthesis can improve efficiency over separate univariate syntheses, may reduce selective outcome reporting biases, and enables joint inferences across the outcomes. A common issue is…

  4. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  5. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha

    2014-12-08

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  6. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha; Huang, Jianhua Z.

    2014-01-01

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  7. Hierarchical regression analysis in structural Equation Modeling

    NARCIS (Netherlands)

    de Jong, P.F.

    1999-01-01

    In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main

  8. Practical multivariate analysis

    CERN Document Server

    Afifi, Abdelmonem; Clark, Virginia A

    2011-01-01

    ""First of all, it is very easy to read. … The authors manage to introduce and (at least partially) explain even quite complex concepts, e.g. eigenvalues, in an easy and pedagogical way that I suppose is attractive to readers without deeper statistical knowledge. The text is also sprinkled with references for those who want to probe deeper into a certain topic. Secondly, I personally find the book's emphasis on practical data handling very appealing. … Thirdly, the book gives very nice coverage of regression analysis. … this is a nicely written book that gives a good overview of a large number

  9. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.

    Science.gov (United States)

    Grapov, Dmitry; Newman, John W

    2012-09-01

    Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010).

  10. Pattern recognition by the use of multivariate statistical evaluation of macro- and micro-PIXE results

    International Nuclear Information System (INIS)

    Tapper, U.A.S.; Malmqvist, K.G.; Loevestam, N.E.G.; Swietlicki, E.; Salford, L.G.

    1991-01-01

    The importance of statistical evaluation of multielemental data is illustrated using the data collected in a macro- and micro-PIXE analysis of human brain tumours. By employing a multivariate statistical classification methodology (SIMCA) it was shown that the total information collected from each specimen separates three types of tissue: High malignant, less malignant and normal brain tissue. This makes a classification of a given specimen possible based on the elemental concentrations. Partial least squares regression (PLS), a multivariate regression method, made it possible to study the relative importance of the examined nine trace elements, the dry/wet weight ratio and the age of the patient in predicting the survival time after operation for patients with the high malignant form, astrocytomas grade III-IV. The elemental maps from a microprobe analysis were also subjected to multivariate analysis. This showed that the six elements sorted into maps could be presented in three maps containing all the relevant information. The intensity in these maps is proportional to the value (score) of the actual pixel along the calculated principal components. (orig.)

  11. NIR and Py-mbms coupled with multivariate data analysis as a high-throughput biomass characterization technique : a review

    Directory of Open Access Journals (Sweden)

    Li eXiao

    2014-08-01

    Full Text Available Optimizing the use of lignocellulosic biomass as the feedstock for renewable energy production is currently being developed globally. Biomass is a complex mixture of cellulose, hemicelluloses, lignins, extractives, and proteins; as well as inorganic salts. Cell wall compositional analysis for biomass characterization is laborious and time consuming. In order to characterize biomass fast and efficiently, several high through-put technologies have been successfully developed. Among them, near infrared spectroscopy (NIR and pyrolysis-molecular beam mass spectrometry (Py-mbms are complementary tools and capable of evaluating a large number of raw or modified biomass in a short period of time. NIR shows vibrations associated with specific chemical structures whereas Py-mbms depicts the full range of fragments from the decomposition of biomass. Both NIR vibrations and Py-mbms peaks are assigned to possible chemical functional groups and molecular structures. They provide complementary information of chemical insight of biomaterials. However, it is challenging to interpret the informative results because of the large amount of overlapping bands or decomposition fragments contained in the spectra. In order to improve the efficiency of data analysis, multivariate analysis tools have been adapted to define the significant correlations among data variables, so that the large number of bands/peaks could be replaced by a small number of reconstructed variables representing original variation. Reconstructed data variables are used for sample comparison (principal component analysis and for building regression models (partial least square regression between biomass chemical structures and properties of interests. In this review, the important biomass chemical structures measured by NIR and Py-mbms are summarized. The advantages and disadvantages of conventional data analysis methods and multivariate data analysis methods are introduced, compared and evaluated

  12. Multivariate Generalized Multiscale Entropy Analysis

    Directory of Open Access Journals (Sweden)

    Anne Humeau-Heurtier

    2016-11-01

    Full Text Available Multiscale entropy (MSE was introduced in the 2000s to quantify systems’ complexity. MSE relies on (i a coarse-graining procedure to derive a set of time series representing the system dynamics on different time scales; (ii the computation of the sample entropy for each coarse-grained time series. A refined composite MSE (rcMSE—based on the same steps as MSE—also exists. Compared to MSE, rcMSE increases the accuracy of entropy estimation and reduces the probability of inducing undefined entropy for short time series. The multivariate versions of MSE (MMSE and rcMSE (MrcMSE have also been introduced. In the coarse-graining step used in MSE, rcMSE, MMSE, and MrcMSE, the mean value is used to derive representations of the original data at different resolutions. A generalization of MSE was recently published, using the computation of different moments in the coarse-graining procedure. However, so far, this generalization only exists for univariate signals. We therefore herein propose an extension of this generalized MSE to multivariate data. The multivariate generalized algorithms of MMSE and MrcMSE presented herein (MGMSE and MGrcMSE, respectively are first analyzed through the processing of synthetic signals. We reveal that MGrcMSE shows better performance than MGMSE for short multivariate data. We then study the performance of MGrcMSE on two sets of short multivariate electroencephalograms (EEG available in the public domain. We report that MGrcMSE may show better performance than MrcMSE in distinguishing different types of multivariate EEG data. MGrcMSE could therefore supplement MMSE or MrcMSE in the processing of multivariate datasets.

  13. Time-series panel analysis (TSPA): multivariate modeling of temporal associations in psychotherapy process.

    Science.gov (United States)

    Ramseyer, Fabian; Kupper, Zeno; Caspar, Franz; Znoj, Hansjörg; Tschacher, Wolfgang

    2014-10-01

    Processes occurring in the course of psychotherapy are characterized by the simple fact that they unfold in time and that the multiple factors engaged in change processes vary highly between individuals (idiographic phenomena). Previous research, however, has neglected the temporal perspective by its traditional focus on static phenomena, which were mainly assessed at the group level (nomothetic phenomena). To support a temporal approach, the authors introduce time-series panel analysis (TSPA), a statistical methodology explicitly focusing on the quantification of temporal, session-to-session aspects of change in psychotherapy. TSPA-models are initially built at the level of individuals and are subsequently aggregated at the group level, thus allowing the exploration of prototypical models. TSPA is based on vector auto-regression (VAR), an extension of univariate auto-regression models to multivariate time-series data. The application of TSPA is demonstrated in a sample of 87 outpatient psychotherapy patients who were monitored by postsession questionnaires. Prototypical mechanisms of change were derived from the aggregation of individual multivariate models of psychotherapy process. In a 2nd step, the associations between mechanisms of change (TSPA) and pre- to postsymptom change were explored. TSPA allowed a prototypical process pattern to be identified, where patient's alliance and self-efficacy were linked by a temporal feedback-loop. Furthermore, therapist's stability over time in both mastery and clarification interventions was positively associated with better outcomes. TSPA is a statistical tool that sheds new light on temporal mechanisms of change. Through this approach, clinicians may gain insight into prototypical patterns of change in psychotherapy. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  14. Prevalence of treponema species detected in endodontic infections: systematic review and meta-regression analysis.

    Science.gov (United States)

    Leite, Fábio R M; Nascimento, Gustavo G; Demarco, Flávio F; Gomes, Brenda P F A; Pucci, Cesar R; Martinho, Frederico C

    2015-05-01

    This systematic review and meta-regression analysis aimed to calculate a combined prevalence estimate and evaluate the prevalence of different Treponema species in primary and secondary endodontic infections, including symptomatic and asymptomatic cases. The MEDLINE/PubMed, Embase, Scielo, Web of Knowledge, and Scopus databases were searched without starting date restriction up to and including March 2014. Only reports in English were included. The selected literature was reviewed by 2 authors and classified as suitable or not to be included in this review. Lists were compared, and, in case of disagreements, decisions were made after a discussion based on inclusion and exclusion criteria. A pooled prevalence of Treponema species in endodontic infections was estimated. Additionally, a meta-regression analysis was performed. Among the 265 articles identified in the initial search, only 51 were included in the final analysis. The studies were classified into 2 different groups according to the type of endodontic infection and whether it was an exclusively primary/secondary study (n = 36) or a primary/secondary comparison (n = 15). The pooled prevalence of Treponema species was 41.5% (95% confidence interval, 35.9-47.0). In the multivariate model of meta-regression analysis, primary endodontic infections (P apical abscess, symptomatic apical periodontitis (P < .001), and concomitant presence of 2 or more species (P = .028) explained the heterogeneity regarding the prevalence rates of Treponema species. Our findings suggest that Treponema species are important pathogens involved in endodontic infections, particularly in cases of primary and acute infections. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  15. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

    Science.gov (United States)

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

  16. Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study.

    Directory of Open Access Journals (Sweden)

    Binod Neupane

    Full Text Available In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when

  17. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  18. Power Estimation in Multivariate Analysis of Variance

    Directory of Open Access Journals (Sweden)

    Jean François Allaire

    2007-09-01

    Full Text Available Power is often overlooked in designing multivariate studies for the simple reason that it is believed to be too complicated. In this paper, it is shown that power estimation in multivariate analysis of variance (MANOVA can be approximated using a F distribution for the three popular statistics (Hotelling-Lawley trace, Pillai-Bartlett trace, Wilk`s likelihood ratio. Consequently, the same procedure, as in any statistical test, can be used: computation of the critical F value, computation of the noncentral parameter (as a function of the effect size and finally estimation of power using a noncentral F distribution. Various numerical examples are provided which help to understand and to apply the method. Problems related to post hoc power estimation are discussed.

  19. Regression analysis using dependent Polya trees.

    Science.gov (United States)

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  20. Some developments in multivariate image analysis

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey

    be up to several million. The main MIA tool for exploratory analysis is score density plot – all pixels are projected into principal component space and on the corresponding scores plots are colorized according to their density (how many pixels are crowded in the unit area of the plot). Looking...... for and analyzing patterns on these plots and the original image allow to do interactive analysis, to get some hidden information, build a supervised classification model, and much more. In the present work several alternative methods to original principal component analysis (PCA) for building the projection......Multivariate image analysis (MIA), one of the successful chemometric applications, now is used widely in different areas of science and industry. Introduced in late 80s it has became very popular with hyperspectral imaging, where MIA is one of the most efficient tools for exploratory analysis...

  1. A comparison between multivariate and bivariate analysis used in marketing research

    Directory of Open Access Journals (Sweden)

    Constantin, C.

    2012-01-01

    Full Text Available This paper is about an instrumental research conducted in order to compare the information given by two multivariate data analysis in comparison with the usual bivariate analysis. The outcomes of the research reveal that sometimes the multivariate methods use more information from a certain variable, but sometimes they use only a part of the information considered the most important for certain associations. For this reason, a researcher should use both categories of data analysis in order to obtain entirely useful information.

  2. RAWS II: A MULTIPLE REGRESSION ANALYSIS PROGRAM,

    Science.gov (United States)

    This memorandum gives instructions for the use and operation of a revised version of RAWS, a multiple regression analysis program. The program...of preprocessed data, the directed retention of variable, listing of the matrix of the normal equations and its inverse, and the bypassing of the regression analysis to provide the input variable statistics only. (Author)

  3. Assessing risk factors for periodontitis using regression

    Science.gov (United States)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  4. Multivariate Volatility Impulse Response Analysis of GFC News Events

    NARCIS (Netherlands)

    D.E. Allen (David); M.J. McAleer (Michael); R.J. Powell (Robert)

    2015-01-01

    markdownabstract__Abstract__ This paper applies the Hafner and Herwartz (2006) (hereafter HH) approach to the analysis of multivariate GARCH models using volatility impulse response analysis. The data set features ten years of daily returns series for the New York Stock Exchange Index and the

  5. Multivariant design and multiple criteria analysis of building refurbishments

    Energy Technology Data Exchange (ETDEWEB)

    Kaklauskas, A.; Zavadskas, E. K.; Raslanas, S. [Faculty of Civil Engineering, Vilnius Gediminas Technical University, Vilnius (Lithuania)

    2005-07-01

    In order to design and realize an efficient building refurbishment, it is necessary to carry out an exhaustive investigation of all solutions that form it. The efficiency level of the considered building's refurbishment depends on a great many of factors, including: cost of refurbishment, annual fuel economy after refurbishment, tentative pay-back time, harmfulness to health of the materials used, aesthetics, maintenance properties, functionality, comfort, sound insulation and longevity, etc. Solutions of an alternative character allow for a more rational and realistic assessment of economic, ecological, legislative, climatic, social and political conditions, traditions and for better the satisfaction of customer requirements. They also enable one to cut down on refurbishment costs. In carrying out the multivariant design and multiple criteria analysis of a building refurbishment much data was processed and evaluated. Feasible alternatives could be as many as 100,000. How to perform a multivariant design and multiple criteria analysis of alternate alternatives based on the enormous amount of information became the problem. Method of multivariant design and multiple criteria of a building refurbishment's analysis were developed by the authors to solve the above problems. In order to demonstrate the developed method, a practical example is presented in this paper. (author)

  6. Multivariate Adaptative Regression Splines (MARS, una alternativa para el análisis de series de tiempo

    Directory of Open Access Journals (Sweden)

    Jairo Vanegas

    2017-05-01

    Full Text Available Multivariate Adaptative Regression Splines (MARS es un método de modelación no paramétrico que extiende el modelo lineal incorporando no linealidades e interacciones de variables. Es una herramienta flexible que automatiza la construcción de modelos de predicción, seleccionando variables relevantes, transformando las variables predictoras, tratando valores perdidos y previniendo sobreajustes mediante un autotest. También permite predecir tomando en cuenta factores estructurales que pudieran tener influencia sobre la variable respuesta, generando modelos hipotéticos. El resultado final serviría para identificar puntos de corte relevantes en series de datos. En el área de la salud es poco utilizado, por lo que se propone como una herramienta más para la evaluación de indicadores relevantes en salud pública. Para efectos demostrativos se utilizaron series de datos de mortalidad de menores de 5 años de Costa Rica en el periodo 1978-2008.

  7. Data classification and MTBF prediction with a multivariate analysis approach

    International Nuclear Information System (INIS)

    Braglia, Marcello; Carmignani, Gionata; Frosolini, Marco; Zammori, Francesco

    2012-01-01

    The paper presents a multivariate statistical approach that supports the classification of mechanical components, subjected to specific operating conditions, in terms of the Mean Time Between Failure (MTBF). Assessing the influence of working conditions and/or environmental factors on the MTBF is a prerequisite for the development of an effective preventive maintenance plan. However, this task may be demanding and it is generally performed with ad-hoc experimental methods, lacking of statistical rigor. To solve this common problem, a step by step multivariate data classification technique is proposed. Specifically, a set of structured failure data are classified in a meaningful way by means of: (i) cluster analysis, (ii) multivariate analysis of variance, (iii) feature extraction and (iv) predictive discriminant analysis. This makes it possible not only to define the MTBF of the analyzed components, but also to identify the working parameters that explain most of the variability of the observed data. The approach is finally demonstrated on 126 centrifugal pumps installed in an oil refinery plant; obtained results demonstrate the quality of the final discrimination, in terms of data classification and failure prediction.

  8. Modeling the potential risk factors of bovine viral diarrhea prevalence in Egypt using univariable and multivariable logistic regression analyses

    Directory of Open Access Journals (Sweden)

    Abdelfattah M. Selim

    2018-03-01

    Full Text Available Aim: The present cross-sectional study was conducted to determine the seroprevalence and potential risk factors associated with Bovine viral diarrhea virus (BVDV disease in cattle and buffaloes in Egypt, to model the potential risk factors associated with the disease using logistic regression (LR models, and to fit the best predictive model for the current data. Materials and Methods: A total of 740 blood samples were collected within November 2012-March 2013 from animals aged between 6 months and 3 years. The potential risk factors studied were species, age, sex, and herd location. All serum samples were examined with indirect ELIZA test for antibody detection. Data were analyzed with different statistical approaches such as Chi-square test, odds ratios (OR, univariable, and multivariable LR models. Results: Results revealed a non-significant association between being seropositive with BVDV and all risk factors, except for species of animal. Seroprevalence percentages were 40% and 23% for cattle and buffaloes, respectively. OR for all categories were close to one with the highest OR for cattle relative to buffaloes, which was 2.237. Likelihood ratio tests showed a significant drop of the -2LL from univariable LR to multivariable LR models. Conclusion: There was an evidence of high seroprevalence of BVDV among cattle as compared with buffaloes with the possibility of infection in different age groups of animals. In addition, multivariable LR model was proved to provide more information for association and prediction purposes relative to univariable LR models and Chi-square tests if we have more than one predictor.

  9. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

    KAUST Repository

    Rubio, Francisco J.

    2016-02-09

    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.

  10. Understanding gendered aspects of migration aspiration and motives of university students by multivariate statistical methods

    Directory of Open Access Journals (Sweden)

    Đula Borozan

    2014-03-01

    Full Text Available The paper deals with the application of multivariate analysis of variance and logistic regression in measuring, explaining and evaluating (i gender differences in expressing migration aspirations, and (ii a gender effect on migration motivation of university students in Croatia. The results supported the thesis that migration is a complex gendering process that assumes subjective assessment of the whole set of interrelated motives. According to logistic regression, gender is a significant predictor of migration aspirations among the selected demographic and socio-economic variables. A multivariate analysis of variance showed that gender and migration aspirations in interaction matter when it comes to migration motives, particularly related to the perceived importance of social networks. Females, and especially those who aspire to migrate, assessed these motives as more important than males.

  11. Multivariate Volatility Impulse Response Analysis of GFC News Events

    NARCIS (Netherlands)

    D.E. Allen (David); M.J. McAleer (Michael); R.J. Powell (Robert); A.K. Singh (Abhay)

    2015-01-01

    textabstractThis paper applies the Hafner and Herwartz (2006) (hereafter HH) approach to the analysis of multivariate GARCH models using volatility impulse response analysis. The data set features ten years of daily returns series for the New York Stock Exchange Index and the FTSE 100 index from the

  12. A Study of Effects of MultiCollinearity in the Multivariable Analysis.

    Science.gov (United States)

    Yoo, Wonsuk; Mayberry, Robert; Bae, Sejong; Singh, Karan; Peter He, Qinghua; Lillard, James W

    2014-10-01

    A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables.

  13. New strategy for determination of anthocyanins, polyphenols and antioxidant capacity of Brassica oleracea liquid extract using infrared spectroscopies and multivariate regression

    Science.gov (United States)

    de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.

    2018-04-01

    A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.

  14. Common pitfalls in statistical analysis: Linear regression analysis

    Directory of Open Access Journals (Sweden)

    Rakesh Aggarwal

    2017-01-01

    Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.

  15. Multivariate data analysis

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg

    Interest in statistical methodology is increasing so rapidly in the astronomical community that accessible introductory material in this area is long overdue. This book fills the gap by providing a presentation of the most useful techniques in multivariate statistics. A wide-ranging annotated set...

  16. HORIZONTAL BRANCH MORPHOLOGY OF GLOBULAR CLUSTERS: A MULTIVARIATE STATISTICAL ANALYSIS

    International Nuclear Information System (INIS)

    Jogesh Babu, G.; Chattopadhyay, Tanuka; Chattopadhyay, Asis Kumar; Mondal, Saptarshi

    2009-01-01

    The proper interpretation of horizontal branch (HB) morphology is crucial to the understanding of the formation history of stellar populations. In the present study a multivariate analysis is used (principal component analysis) for the selection of appropriate HB morphology parameter, which, in our case, is the logarithm of effective temperature extent of the HB (log T effHB ). Then this parameter is expressed in terms of the most significant observed independent parameters of Galactic globular clusters (GGCs) separately for coherent groups, obtained in a previous work, through a stepwise multiple regression technique. It is found that, metallicity ([Fe/H]), central surface brightness (μ v ), and core radius (r c ) are the significant parameters to explain most of the variations in HB morphology (multiple R 2 ∼ 0.86) for GGC elonging to the bulge/disk while metallicity ([Fe/H]) and absolute magnitude (M v ) are responsible for GGC belonging to the inner halo (multiple R 2 ∼ 0.52). The robustness is tested by taking 1000 bootstrap samples. A cluster analysis is performed for the red giant branch (RGB) stars of the GGC belonging to Galactic inner halo (Cluster 2). A multi-episodic star formation is preferred for RGB stars of GGC belonging to this group. It supports the asymptotic giant branch (AGB) model in three episodes instead of two as suggested by Carretta et al. for halo GGC while AGB model is suggested to be revisited for bulge/disk GGC.

  17. Using Multivariate Regression Model with Least Absolute Shrinkage and Selection Operator (LASSO) to Predict the Incidence of Xerostomia after Intensity-Modulated Radiotherapy for Head and Neck Cancer

    Science.gov (United States)

    Ting, Hui-Min; Chang, Liyun; Huang, Yu-Jie; Wu, Jia-Ming; Wang, Hung-Yu; Horng, Mong-Fong; Chang, Chun-Ming; Lan, Jen-Hong; Huang, Ya-Yu; Fang, Fu-Min; Leung, Stephen Wan

    2014-01-01

    Purpose The aim of this study was to develop a multivariate logistic regression model with least absolute shrinkage and selection operator (LASSO) to make valid predictions about the incidence of moderate-to-severe patient-rated xerostomia among head and neck cancer (HNC) patients treated with IMRT. Methods and Materials Quality of life questionnaire datasets from 206 patients with HNC were analyzed. The European Organization for Research and Treatment of Cancer QLQ-H&N35 and QLQ-C30 questionnaires were used as the endpoint evaluation. The primary endpoint (grade 3+ xerostomia) was defined as moderate-to-severe xerostomia at 3 (XER3m) and 12 months (XER12m) after the completion of IMRT. Normal tissue complication probability (NTCP) models were developed. The optimal and suboptimal numbers of prognostic factors for a multivariate logistic regression model were determined using the LASSO with bootstrapping technique. Statistical analysis was performed using the scaled Brier score, Nagelkerke R2, chi-squared test, Omnibus, Hosmer-Lemeshow test, and the AUC. Results Eight prognostic factors were selected by LASSO for the 3-month time point: Dmean-c, Dmean-i, age, financial status, T stage, AJCC stage, smoking, and education. Nine prognostic factors were selected for the 12-month time point: Dmean-i, education, Dmean-c, smoking, T stage, baseline xerostomia, alcohol abuse, family history, and node classification. In the selection of the suboptimal number of prognostic factors by LASSO, three suboptimal prognostic factors were fine-tuned by Hosmer-Lemeshow test and AUC, i.e., Dmean-c, Dmean-i, and age for the 3-month time point. Five suboptimal prognostic factors were also selected for the 12-month time point, i.e., Dmean-i, education, Dmean-c, smoking, and T stage. The overall performance for both time points of the NTCP model in terms of scaled Brier score, Omnibus, and Nagelkerke R2 was satisfactory and corresponded well with the expected values. Conclusions

  18. Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data.

    Science.gov (United States)

    Levine, Matthew E; Albers, David J; Hripcsak, George

    2016-01-01

    Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.

  19. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    Science.gov (United States)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  20. Global Sensitivity Analysis for multivariate output using Polynomial Chaos Expansion

    International Nuclear Information System (INIS)

    Garcia-Cabrejo, Oscar; Valocchi, Albert

    2014-01-01

    Many mathematical and computational models used in engineering produce multivariate output that shows some degree of correlation. However, conventional approaches to Global Sensitivity Analysis (GSA) assume that the output variable is scalar. These approaches are applied on each output variable leading to a large number of sensitivity indices that shows a high degree of redundancy making the interpretation of the results difficult. Two approaches have been proposed for GSA in the case of multivariate output: output decomposition approach [9] and covariance decomposition approach [14] but they are computationally intensive for most practical problems. In this paper, Polynomial Chaos Expansion (PCE) is used for an efficient GSA with multivariate output. The results indicate that PCE allows efficient estimation of the covariance matrix and GSA on the coefficients in the approach defined by Campbell et al. [9], and the development of analytical expressions for the multivariate sensitivity indices defined by Gamboa et al. [14]. - Highlights: • PCE increases computational efficiency in 2 approaches of GSA of multivariate output. • Efficient estimation of covariance matrix of output from coefficients of PCE. • Efficient GSA on coefficients of orthogonal decomposition of the output using PCE. • Analytical expressions of multivariate sensitivity indices from coefficients of PCE

  1. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  2. Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

    KAUST Repository

    Chen, Lisha

    2012-12-01

    The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.

  3. Search for the top quark using multivariate analysis techniques

    International Nuclear Information System (INIS)

    Bhat, P.C.

    1994-08-01

    The D0 collaboration is developing top search strategies using multivariate analysis techniques. We report here on applications of the H-matrix method to the eμ channel and neural networks to the e+jets channel

  4. A review of multivariate analyses in imaging genetics

    Directory of Open Access Journals (Sweden)

    Jingyu eLiu

    2014-03-01

    Full Text Available Recent advances in neuroimaging technology and molecular genetics provide the unique opportunity to investigate genetic influence on the variation of brain attributes. Since the year 2000, when the initial publication on brain imaging and genetics was released, imaging genetics has been a rapidly growing research approach with increasing publications every year. Several reviews have been offered to the research community focusing on various study designs. In addition to study design, analytic tools and their proper implementation are also critical to the success of a study. In this review, we survey recent publications using data from neuroimaging and genetics, focusing on methods capturing multivariate effects accommodating the large number of variables from both imaging data and genetic data. We group the analyses of genetic or genomic data into either a prior driven or data driven approach, including gene-set enrichment analysis, multifactor dimensionality reduction, principal component analysis, independent component analysis (ICA, and clustering. For the analyses of imaging data, ICA and extensions of ICA are the most widely used multivariate methods. Given detailed reviews of multivariate analyses of imaging data available elsewhere, we provide a brief summary here that includes a recently proposed method known as independent vector analysis. Finally, we review methods focused on bridging the imaging and genetic data by establishing multivariate and multiple genotype-phenotype associations, including sparse partial least squares, sparse canonical correlation analysis, sparse reduced rank regression and parallel ICA. These methods are designed to extract latent variables from both genetic and imaging data, which become new genotypes and phenotypes, and the links between the new genotype-phenotype pairs are maximized using different cost functions. The relationship between these methods along with their assumptions, advantages, and

  5. Multivariate analysis of attenuated total reflection Fourier transform infrared (ATR FT-IR) spectroscopic data to confirm phase partitioning in methacrylate-based dentin adhesive.

    Science.gov (United States)

    Ye, Qiang; Parthasarathy, Ranganathan; Abedin, Farhana; Laurence, Jennifer S; Misra, Anil; Spencer, Paulette

    2013-12-01

    Water is ubiquitous in the mouths of healthy individuals and is a major interfering factor in the development of a durable seal between the tooth and composite restoration. Water leads to the formation of a variety of defects in dentin adhesives; these defects undermine the tooth-composite bond. Our group recently analyzed phase partitioning of dentin adhesives using high-performance liquid chromatography (HPLC). The concentration measurements provided by HPLC offered a more thorough representation of current adhesive performance and elucidated directions to be taken for further improvement. The sample preparation and instrument analysis using HPLC are, however, time-consuming and labor-intensive. The objective of this work was to develop a methodology for rapid, reliable, and accurate quantitative analysis of near-equilibrium phase partitioning in adhesives exposed to conditions simulating the wet oral environment. Analysis by Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate statistical methods, including partial least squares (PLS) regression and principal component regression (PCR), were used for multivariate calibration to quantify the compositions in separated phases. Excellent predictions were achieved when either the hydrophobic-rich phase or the hydrophilic-rich phase mixtures were analyzed. These results indicate that FT-IR spectroscopy has excellent potential as a rapid method of detection and quantification of dentin adhesives that experience phase separation under conditions that simulate the wet oral environment.

  6. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    Science.gov (United States)

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  7. Perioperative factors predicting poor outcome in elderly patients following emergency general surgery: a multivariate regression analysis

    Science.gov (United States)

    Lees, Mackenzie C.; Merani, Shaheed; Tauh, Keerit; Khadaroo, Rachel G.

    2015-01-01

    Background Older adults (≥ 65 yr) are the fastest growing population and are presenting in increasing numbers for acute surgical care. Emergency surgery is frequently life threatening for older patients. Our objective was to identify predictors of mortality and poor outcome among elderly patients undergoing emergency general surgery. Methods We conducted a retrospective cohort study of patients aged 65–80 years undergoing emergency general surgery between 2009 and 2010 at a tertiary care centre. Demographics, comorbidities, in-hospital complications, mortality and disposition characteristics of patients were collected. Logistic regression analysis was used to identify covariate-adjusted predictors of in-hospital mortality and discharge of patients home. Results Our analysis included 257 patients with a mean age of 72 years; 52% were men. In-hospital mortality was 12%. Mortality was associated with patients who had higher American Society of Anesthesiologists (ASA) class (odds ratio [OR] 3.85, 95% confidence interval [CI] 1.43–10.33, p = 0.008) and in-hospital complications (OR 1.93, 95% CI 1.32–2.83, p = 0.001). Nearly two-thirds of patients discharged home were younger (OR 0.92, 95% CI 0.85–0.99, p = 0.036), had lower ASA class (OR 0.45, 95% CI 0.27–0.74, p = 0.002) and fewer in-hospital complications (OR 0.69, 95% CI 0.53–0.90, p = 0.007). Conclusion American Society of Anesthesiologists class and in-hospital complications are perioperative predictors of mortality and disposition in the older surgical population. Understanding the predictors of poor outcome and the importance of preventing in-hospital complications in older patients will have important clinical utility in terms of preoperative counselling, improving health care and discharging patients home. PMID:26204143

  8. Conversion from laparoscopic to open cholecystectomy: Multivariate analysis of preoperative risk factors

    Directory of Open Access Journals (Sweden)

    Khan M

    2005-01-01

    Full Text Available BACKGROUND: Laparoscopic cholecystectomy has become the gold standard in the treatment of symptomatic cholelithiasis. Some patients require conversion to open surgery and several preoperative variables have been identified as risk factors that are helpful in predicting the probability of conversion. However, there is a need to devise a risk-scoring system based on the identified risk factors to (a predict the risk of conversion preoperatively for selected patients, (b prepare the patient psychologically, (c arrange operating schedules accordingly, and (d minimize the procedure-related cost and help overcome financial constraints, which is a significant problem in developing countries. AIM: This study was aimed to evaluate preoperative risk factors for conversion from laparoscopic to open cholecystectomy in our setting. SETTINGS AND DESIGNS: A case control study of patients who underwent laparoscopic surgery from January 1997 to December 2001 was conducted at the Aga Khan University Hospital, Karachi, Pakistan. MATERIALS AND METHODS: All those patients who were converted to open surgery (n = 73 were enrolled as cases. Two controls who had successful laparoscopic surgery (n = 146 were matched with each case for operating surgeon and closest date of surgery. STATISTICAL ANALYSIS USED: Descriptive statistics were computed and, univariate and multivariate analysis was done through multiple logistic regression. RESULTS: The final multivariate model identified two risk factors for conversion: ultrasonographic signs of inflammation (adjusted odds ratio [aOR] = 8.5; 95% confidence interval [CI]: 3.3, 21.9 and age > 60 years (aOR = 8.1; 95% CI: 2.9, 22.2 after adjusting for physical signs, alkaline phosphatase and BMI levels. CONCLUSION: Preoperative risk factors evaluated by the present study confirm the likelihood of conversion. Recognition of these factors is important for understanding the characteristics of patients at a higher risk of conversion.

  9. Regularized principal covariates regression and its application to finding coupled patterns in climate fields

    Science.gov (United States)

    Fischer, M. J.

    2014-02-01

    There are many different methods for investigating the coupling between two climate fields, which are all based on the multivariate regression model. Each different method of solving the multivariate model has its own attractive characteristics, but often the suitability of a particular method for a particular problem is not clear. Continuum regression methods search the solution space between the conventional methods and thus can find regression model subspaces that mix the attractive characteristics of the end-member subspaces. Principal covariates regression is a continuum regression method that is easily applied to climate fields and makes use of two end-members: principal components regression and redundancy analysis. In this study, principal covariates regression is extended to additionally span a third end-member (partial least squares or maximum covariance analysis). The new method, regularized principal covariates regression, has several attractive features including the following: it easily applies to problems in which the response field has missing values or is temporally sparse, it explores a wide range of model spaces, and it seeks a model subspace that will, for a set number of components, have a predictive skill that is the same or better than conventional regression methods. The new method is illustrated by applying it to the problem of predicting the southern Australian winter rainfall anomaly field using the regional atmospheric pressure anomaly field. Regularized principal covariates regression identifies four major coupled patterns in these two fields. The two leading patterns, which explain over half the variance in the rainfall field, are related to the subtropical ridge and features of the zonally asymmetric circulation.

  10. Multivariate techniques of analysis for ToF-E recoil spectrometry data

    Energy Technology Data Exchange (ETDEWEB)

    Whitlow, H.J.; Bouanani, M.E.; Persson, L.; Hult, M.; Jonsson, P.; Johnston, P.N. [Lund Institute of Technology, Solvegatan, (Sweden), Department of Nuclear Physics; Andersson, M. [Uppsala Univ. (Sweden). Dept. of Organic Chemistry; Ostling, M.; Zaring, C. [Royal institute of Technology, Electrum, Kista, (Sweden), Department of Electronics; Johnston, P.N.; Bubb, I.F.; Walker, B.R.; Stannard, W.B. [Royal Melbourne Inst. of Tech., VIC (Australia); Cohen, D.D.; Dytlewski, N. [Australian Nuclear Science and Technology Organisation, Lucas Heights, NSW (Australia)

    1996-12-31

    Multivariate statistical methods are being developed by the Australian -Swedish Recoil Spectrometry Collaboration for quantitative analysis of the wealth of information in Time of Flight (ToF) and energy dispersive Recoil Spectrometry. An overview is presented of progress made in the use of multivariate techniques for energy calibration, separation of mass-overlapped signals and simulation of ToF-E data. 6 refs., 5 figs.

  11. Multivariate techniques of analysis for ToF-E recoil spectrometry data

    Energy Technology Data Exchange (ETDEWEB)

    Whitlow, H J; Bouanani, M E; Persson, L; Hult, M; Jonsson, P; Johnston, P N [Lund Institute of Technology, Solvegatan, (Sweden), Department of Nuclear Physics; Andersson, M [Uppsala Univ. (Sweden). Dept. of Organic Chemistry; Ostling, M; Zaring, C [Royal institute of Technology, Electrum, Kista, (Sweden), Department of Electronics; Johnston, P N; Bubb, I F; Walker, B R; Stannard, W B [Royal Melbourne Inst. of Tech., VIC (Australia); Cohen, D D; Dytlewski, N [Australian Nuclear Science and Technology Organisation, Lucas Heights, NSW (Australia)

    1997-12-31

    Multivariate statistical methods are being developed by the Australian -Swedish Recoil Spectrometry Collaboration for quantitative analysis of the wealth of information in Time of Flight (ToF) and energy dispersive Recoil Spectrometry. An overview is presented of progress made in the use of multivariate techniques for energy calibration, separation of mass-overlapped signals and simulation of ToF-E data. 6 refs., 5 figs.

  12. Advanced event reweighting using multivariate analysis

    International Nuclear Information System (INIS)

    Martschei, D; Feindt, M; Honc, S; Wagner-Kuhr, J

    2012-01-01

    Multivariate analysis (MVA) methods, especially discrimination techniques such as neural networks, are key ingredients in modern data analysis and play an important role in high energy physics. They are usually trained on simulated Monte Carlo (MC) samples to discriminate so called 'signal' from 'background' events and are then applied to data to select real events of signal type. We here address procedures that improve this work flow. This will be the enhancement of data / MC agreement by reweighting MC samples on a per event basis. Then training MVAs on real data using the sPlot technique will be discussed. Finally we will address the construction of MVAs whose discriminator is independent of a certain control variable, i.e. cuts on this variable will not change the discriminator shape.

  13. Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design.

    Science.gov (United States)

    Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup

    2010-10-01

    We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.

  14. Replacement tunnelled dialysis catheters for haemodialysis access: Same site, new site, or exchange — A multivariate analysis and risk score

    International Nuclear Information System (INIS)

    Tapping, C.R.; Scott, P.M.; Lakshminarayan, R.; Ettles, D.F.; Robinson, G.J.

    2012-01-01

    Aim: To identify variables related to complications following tunnelled dialysis catheter (TDC) replacement and stratifying the risk to reduce morbidity in patients with end-stage renal disease. Materials and methods: One hundred and forty TDCs (Split Cath, medCOMP) were replaced in 140 patients over a 5 year period. Multiple variables were retrospectively collected and analysed to stratify the risk and to predict patients who were more likely to suffer from complications. Multivariate regression analysis was used to identify variables predictive of complications. Results: There were six immediate complications, 42 early complications, and 37 late complications. Multivariate analysis revealed that variables significantly associated to complications were: female sex (p = 0.003; OR 2.9); previous TDC in the same anatomical position in the past (p = 0.014; OR 4.1); catheter exchange (p = 0.038; OR 3.8); haemoglobin 15 s (p = 0.002; OR 4.1); and C-reactive protein >50 mg/l (p = 0.007; OR 4.6). A high-risk score, which used the values from the multivariate analysis, predicted 100% of the immediate complications, 95% of the early complications, and 68% of the late complications. Conclusion: Patients can now be scored prior to TDC replacement. A patient with a high-risk score can be optimized to reduce the chance of complications. Further prospective studies to confirm that rotating the site of TDC reduces complications are warranted as this has implications for current guidelines.

  15. Characterization of breast masses by dynamic enhanced MR imaging. A logistic regression analysis

    International Nuclear Information System (INIS)

    Ikeda, O.; Morishita, S.; Kido, T.; Kitajima, M.; Yamashita, Y.; Takahashi, M.; Okamura, K.; Fukuda, S.

    1999-01-01

    Purpose: To identify features useful for differentiation between malignant and benign breast neoplasms using multivariate analysis of findings by MR imaging. Material and Methods: In a retrospective analysis, 61 patients with 64 breast masses underwent MR imaging and the time-signal intensity curves for precontrast dynamic postcontrast images were quantitatively analyzed. Statistical analysis was performed using a logistic regression model, which was prospectively tested in another 34 patients with suspected breast masses. Results: Univariate analysis revealed that the reliable indicators for malignancy were first the appearance of the tumor border, followed by the washout ratio, internal architecture after contrast enhancement, and peak time. The factors significantly associated with malignancy were irregular tumor border, followed by washout ratio, internal architecture, and peak time. For differentiation between benignity and malignancy, the maximum cut-off point was to be found between 0.47 and 0.51. In a prospective application of this model, 91% of the lesions were accurately discriminated as benign or malignant lesions. Conclusion: Combination of contrast-enhanced dynamic and postcontrast-enhanced MR imaging provided accurate data for the diagnosis of malignant neoplasms of the breast. The model had an accuracy of 91% (sensitivity 90%, specificity 93%). (orig.)

  16. Estimating an Effect Size in One-Way Multivariate Analysis of Variance (MANOVA)

    Science.gov (United States)

    Steyn, H. S., Jr.; Ellis, S. M.

    2009-01-01

    When two or more univariate population means are compared, the proportion of variation in the dependent variable accounted for by population group membership is eta-squared. This effect size can be generalized by using multivariate measures of association, based on the multivariate analysis of variance (MANOVA) statistics, to establish whether…

  17. Point defect characterization in HAADF-STEM images using multivariate statistical analysis

    International Nuclear Information System (INIS)

    Sarahan, Michael C.; Chi, Miaofang; Masiel, Daniel J.; Browning, Nigel D.

    2011-01-01

    Quantitative analysis of point defects is demonstrated through the use of multivariate statistical analysis. This analysis consists of principal component analysis for dimensional estimation and reduction, followed by independent component analysis to obtain physically meaningful, statistically independent factor images. Results from these analyses are presented in the form of factor images and scores. Factor images show characteristic intensity variations corresponding to physical structure changes, while scores relate how much those variations are present in the original data. The application of this technique is demonstrated on a set of experimental images of dislocation cores along a low-angle tilt grain boundary in strontium titanate. A relationship between chemical composition and lattice strain is highlighted in the analysis results, with picometer-scale shifts in several columns measurable from compositional changes in a separate column. -- Research Highlights: → Multivariate analysis of HAADF-STEM images. → Distinct structural variations among SrTiO 3 dislocation cores. → Picometer atomic column shifts correlated with atomic column population changes.

  18. Multivariate analysis of factors affecting presence and/or agenesis of third molar tooth.

    Directory of Open Access Journals (Sweden)

    Mohammad Khursheed Alam

    Full Text Available To investigate the presence and/or agenesis of third molar (M3 tooth germs in orthodontics patients in Malaysian Malay and Chinese population and evaluate the relationship between presence and/or agenesis of M3 with different skeletal malocclusion patterns and sagittal maxillomandibular jaw dimensions. Pretreatment records of 300 orthodontic patients (140 males and 160 females, 219 Malaysian Malay and 81 Chinese, average age was 16.27±4.59 were used. Third-molar agenesis was calculated with respect to race, genders, number of missing teeth, jaws, skeletal malocclusion patterns and sagittal maxillomandibular jaw dimensions. The Pearson chi-square test and ANOVA was performed to determine potential differences. Associations between various factors and M3 presence/agenesis groups were assessed using logistic regression analysis. The percentages of subjects with 1 or more M3 agenesis were 30%, 33% and 31% in the Malaysian Malay, Chinese and total population, respectively. Overall prevalence of M3 agenesis in male and female was equal (P>0.05. The frequency of the agenesis of M3s is greater in maxilla as well in the right side (P>0.05. The prevalence of M3 agenesis in those with a Class III and Class II malocclusion was relatively higher in Malaysian Malay and Malaysian Chinese population respectively. Using stepwise regression analyses, significant associations were found between Mx (P<0.05 and ANB (P<0.05 and M3 agenesis. This multivariate analysis suggested that Mx and ANB were significantly correlated with the M3 presence/agenesis.

  19. Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students

    Science.gov (United States)

    Valero-Mora, Pedro M.; Ledesma, Ruben D.

    2011-01-01

    This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…

  20. TMVA - Toolkit for Multivariate Data Analysis with ROOT Users guide

    CERN Document Server

    Höcker, A; Tegenfeldt, F; Voss, H; Voss, K; Christov, A; Henrot-Versillé, S; Jachowski, M; Krasznahorkay, A; Mahalalel, Y; Prudent, X; Speckmayer, P

    2007-01-01

    Multivariate machine learning techniques for the classification of data from high-energy physics (HEP) experiments have become standard tools in most HEP analyses. The multivariate classifiers themselves have significantly evolved in recent years, also driven by developments in other areas inside and outside science. TMVA is a toolkit integrated in ROOT which hosts a large variety of multivariate classification algorithms. They range from rectangular cut optimisation (using a genetic algorithm) and likelihood estimators, over linear and non-linear discriminants (neural networks), to sophisticated recent developments like boosted decision trees and rule ensemble fitting. TMVA organises the simultaneous training, testing, and performance evaluation of all these classifiers with a user-friendly interface, and expedites the application of the trained classifiers to the analysis of data sets with unknown sample composition.

  1. Two Paradoxes in Linear Regression Analysis

    Science.gov (United States)

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  2. Application of Multivariable Statistical Techniques in Plant-wide WWTP Control Strategies Analysis

    DEFF Research Database (Denmark)

    Flores Alsina, Xavier; Comas, J.; Rodríguez-Roda, I.

    2007-01-01

    The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant...... analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii......) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation...

  3. Metal concentration at surface water using multivariate analysis and ...

    African Journals Online (AJOL)

    Metal concentration at surface water using multivariate analysis and human health risk assessment. F Azaman, H Juahir, K Yunus, A Azid, S.I. Khalit, A.D. Mustafa, M.A. Amran, C.N.C. Hasnam, M.Z.A.Z. Abidin, M.A.M. Yusri ...

  4. Application of multivariate adaptive regression spine-assisted objective function on optimization of heat transfer rate around a cylinder

    Energy Technology Data Exchange (ETDEWEB)

    Dey, Prasenjit; Dad, Ajoy K. [Mechanical Engineering Department, National Institute of Technology, Agartala (India)

    2016-12-15

    The present study aims to predict the heat transfer characteristics around a square cylinder with different corner radii using multivariate adaptive regression splines (MARS). Further, the MARS-generated objective function is optimized by particle swarm optimization. The data for the prediction are taken from the recently published article by the present authors [P. Dey, A. Sarkar, A.K. Das, Development of GEP and ANN model to predict the unsteady forced convection over a cylinder, Neural Comput. Appl. (2015). Further, the MARS model is compared with artificial neural network and gene expression programming. It has been found that the MARS model is very efficient in predicting the heat transfer characteristics. It has also been found that MARS is more efficient than artificial neural network and gene expression programming in predicting the forced convection data, and also particle swarm optimization can efficiently optimize the heat transfer rate.

  5. Ultracentrifuge separative power modeling with multivariate regression using covariance matrix

    International Nuclear Information System (INIS)

    Migliavacca, Elder

    2004-01-01

    In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure P p . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and P p to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)

  6. COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

    Directory of Open Access Journals (Sweden)

    ELİF BULUT

    2013-06-01

    Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.

  7. Evaluation of in-line Raman data for end-point determination of a coating process: Comparison of Science-Based Calibration, PLS-regression and univariate data analysis.

    Science.gov (United States)

    Barimani, Shirin; Kleinebudde, Peter

    2017-10-01

    A multivariate analysis method, Science-Based Calibration (SBC), was used for the first time for endpoint determination of a tablet coating process using Raman data. Two types of tablet cores, placebo and caffeine cores, received a coating suspension comprising a polyvinyl alcohol-polyethylene glycol graft-copolymer and titanium dioxide to a maximum coating thickness of 80µm. Raman spectroscopy was used as in-line PAT tool. The spectra were acquired every minute and correlated to the amount of applied aqueous coating suspension. SBC was compared to another well-known multivariate analysis method, Partial Least Squares-regression (PLS) and a simpler approach, Univariate Data Analysis (UVDA). All developed calibration models had coefficient of determination values (R 2 ) higher than 0.99. The coating endpoints could be predicted with root mean square errors (RMSEP) less than 3.1% of the applied coating suspensions. Compared to PLS and UVDA, SBC proved to be an alternative multivariate calibration method with high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Science.gov (United States)

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  9. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel

    Science.gov (United States)

    Grapov, Dmitry; Newman, John W.

    2012-01-01

    Summary: Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Availability and implementation: Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010). Contact: John.Newman@ars.usda.gov Supplementary Information: Installation instructions, tutorials and users manual are available at http://sourceforge.net/projects/imdev/. PMID:22815358

  10. Multivariate statistical analysis of atom probe tomography data

    International Nuclear Information System (INIS)

    Parish, Chad M.; Miller, Michael K.

    2010-01-01

    The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed.

  11. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg

    2007-01-01

    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...... the clock error) and to obtain estimates of the uncertainty with which the position is determined. Regression analysis is used in many other fields of application both in the natural, the technical and the social sciences. Examples may be curve fitting, calibration, establishing relationships between...

  12. Micro-Raman Imaging for Biology with Multivariate Spectral Analysis

    KAUST Repository

    Malvaso, Federica

    2015-05-05

    Raman spectroscopy is a noninvasive technique that can provide complex information on the vibrational state of the molecules. It defines the unique fingerprint that allow the identification of the various chemical components within a given sample. The aim of the following thesis work is to analyze Raman maps related to three pairs of different cells, highlighting differences and similarities through multivariate algorithms. The first pair of analyzed cells are human embryonic stem cells (hESCs), while the other two pairs are induced pluripotent stem cells (iPSCs) derived from T lymphocytes and keratinocytes, respectively. Although two different multivariate techniques were employed, ie Principal Component Analysis and Cluster Analysis, the same results were achieved: the iPSCs derived from T-lymphocytes show a higher content of genetic material both compared with the iPSCs derived from keratinocytes and the hESCs . On the other side, equally evident, was that iPS cells derived from keratinocytes assume a molecular distribution very similar to hESCs.

  13. Analysis of preservative-treated wood by multivariate analysis of laser-induced breakdown spectroscopy spectra

    International Nuclear Information System (INIS)

    Martin, Madhavi Z.; Labbe, Nicole; Rials, Timothy G.; Wullschleger, Stan D.

    2005-01-01

    In this work, multivariate statistical analysis (MVA) techniques are coupled with laser-induced breakdown spectroscopy (LIBS) to identify preservative types (chromated copper arsenate, ammoniacal copper zinc or alkaline copper quat), and to predict elemental content in preservative-treated wood. The elemental composition of the samples was measured with a standard laboratory method of digestion followed by atomic absorption spectroscopy analysis. The elemental composition was then correlated with the LIBS spectra using projection to latent structures (PLS) models. The correlations for the different elements introduced by different treatments were very strong, with the correlation coefficients generally above 0.9. Additionally, principal component analysis (PCA) was used to differentiate the samples treated with different preservative formulations. The research has focused not only on demonstrating the application of LIBS as a tool for use in the forest products industry, but also considered sampling errors, limits of detection, reproducibility, and accuracy of measurements as they relate to multivariate analysis of this complex wood substrate

  14. Analysis of preservative-treated wood by multivariate analysis of laser-induced breakdown spectroscopy spectra

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Madhavi Z. [Environmental Sciences Division Oak Ridge National Laboratory, P.O. Box 2008 MS 6422, Oak Ridge TN 37831-6422 (United States); Labbe, Nicole [Forest Products Center, University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4570 (United States)]. E-mail: nlabbe@utk.edu; Rials, Timothy G. [Forest Products Center, University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4570 (United States); Wullschleger, Stan D. [Environmental Sciences Division Oak Ridge National Laboratory, P.O. Box 2008 MS 6422, Oak Ridge TN 37831-6422 (United States)

    2005-08-31

    In this work, multivariate statistical analysis (MVA) techniques are coupled with laser-induced breakdown spectroscopy (LIBS) to identify preservative types (chromated copper arsenate, ammoniacal copper zinc or alkaline copper quat), and to predict elemental content in preservative-treated wood. The elemental composition of the samples was measured with a standard laboratory method of digestion followed by atomic absorption spectroscopy analysis. The elemental composition was then correlated with the LIBS spectra using projection to latent structures (PLS) models. The correlations for the different elements introduced by different treatments were very strong, with the correlation coefficients generally above 0.9. Additionally, principal component analysis (PCA) was used to differentiate the samples treated with different preservative formulations. The research has focused not only on demonstrating the application of LIBS as a tool for use in the forest products industry, but also considered sampling errors, limits of detection, reproducibility, and accuracy of measurements as they relate to multivariate analysis of this complex wood substrate.

  15. Multivariate Statistical Methods as a Tool of Financial Analysis of Farm Business

    Czech Academy of Sciences Publication Activity Database

    Novák, J.; Sůvová, H.; Vondráček, Jiří

    2002-01-01

    Roč. 48, č. 1 (2002), s. 9-12 ISSN 0139-570X Institutional research plan: AV0Z1030915 Keywords : financial analysis * financial ratios * multivariate statistical methods * correlation analysis * discriminant analysis * cluster analysis Subject RIV: BB - Applied Statistics, Operational Research

  16. Design and analysis of experiments classical and regression approaches with SAS

    CERN Document Server

    Onyiah, Leonard C

    2008-01-01

    Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo

  17. Analysis of Surface Water Pollution in the Kinta River Using Multivariate Technique

    International Nuclear Information System (INIS)

    Hamza Ahmad Isiyaka; Hafizan Juahir

    2015-01-01

    This study aims to investigate the spatial variation in the characteristics of water quality monitoring sites, identify the most significant parameters and the major possible sources of pollution, and apportion the source category in the Kinta River. 31 parameters collected from eight monitoring sites for eight years (2006-2013) were employed. The eight monitoring stations were spatially grouped into three independent clusters in a dendrogram. A drastic reduction in the number of monitored parameters from 31 to eight and nine significant parameters (P<0.05) was achieved using the forward stepwise and backward stepwise discriminate analysis (DA). Principal component analysis (PCA) accounted for more than 76 % in the total variance and attributes the source of pollution to anthropogenic and natural processes. The source apportionment using a combined multiple linear regression and principal component scores indicates that 41 % of the total pollution load is from rock weathering and untreated waste water, 26 % from waste discharge, 24 % from surface runoff and 7 % from faecal waste. This study proposes a reduction in the number of monitoring stations and parameters for a cost effective and time management in the monitoring processes and multivariate technique can provide a simple representation of complex and dynamic water quality characteristics. (author)

  18. Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance

    Science.gov (United States)

    Glascock, M. D.; Neff, H.; Vaughn, K. J.

    2004-06-01

    The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.

  19. Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance

    International Nuclear Information System (INIS)

    Glascock, M. D.; Neff, H.; Vaughn, K. J.

    2004-01-01

    The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.

  20. Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance

    Energy Technology Data Exchange (ETDEWEB)

    Glascock, M. D.; Neff, H. [University of Missouri, Research Reactor Center (United States); Vaughn, K. J. [Pacific Lutheran University, Department of Anthropology (United States)

    2004-06-15

    The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.

  1. Sparse Regression by Projection and Sparse Discriminant Analysis

    KAUST Repository

    Qi, Xin

    2015-04-03

    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  2. Multivariate statistics exercises and solutions

    CERN Document Server

    Härdle, Wolfgang Karl

    2015-01-01

    The authors present tools and concepts of multivariate data analysis by means of exercises and their solutions. The first part is devoted to graphical techniques. The second part deals with multivariate random variables and presents the derivation of estimators and tests for various practical situations. The last part introduces a wide variety of exercises in applied multivariate data analysis. The book demonstrates the application of simple calculus and basic multivariate methods in real life situations. It contains altogether more than 250 solved exercises which can assist a university teacher in setting up a modern multivariate analysis course. All computer-based exercises are available in the R language. All R codes and data sets may be downloaded via the quantlet download center  www.quantlet.org or via the Springer webpage. For interactive display of low-dimensional projections of a multivariate data set, we recommend GGobi.

  3. A Model for Shovel Capital Cost Estimation, Using a Hybrid Model of Multivariate Regression and Neural Networks

    Directory of Open Access Journals (Sweden)

    Abdolreza Yazdani-Chamzini

    2017-12-01

    Full Text Available Cost estimation is an essential issue in feasibility studies in civil engineering. Many different methods can be applied to modelling costs. These methods can be divided into several main groups: (1 artificial intelligence, (2 statistical methods, and (3 analytical methods. In this paper, the multivariate regression (MVR method, which is one of the most popular linear models, and the artificial neural network (ANN method, which is widely applied to solving different prediction problems with a high degree of accuracy, have been combined to provide a cost estimate model for a shovel machine. This hybrid methodology is proposed, taking the advantages of MVR and ANN models in linear and nonlinear modelling, respectively. In the proposed model, the unique advantages of the MVR model in linear modelling are used first to recognize the existing linear structure in data, and, then, the ANN for determining nonlinear patterns in preprocessed data is applied. The results with three indices indicate that the proposed model is efficient and capable of increasing the prediction accuracy.

  4. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...

  5. Simulation Experiments in Practice: Statistical Design and Regression Analysis

    OpenAIRE

    Kleijnen, J.P.C.

    2007-01-01

    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic DOE and regression analysis assume a single simulation response that is normally and independen...

  6. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  7. Local bilinear multiple-output quantile/depth regression

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Lu, Z.; Paindaveine, D.; Šiman, Miroslav

    2015-01-01

    Roč. 21, č. 3 (2015), s. 1435-1466 ISSN 1350-7265 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : conditional depth * growth chart * halfspace depth * local bilinear regression * multivariate quantile * quantile regression * regression depth Subject RIV: BA - General Mathematics Impact factor: 1.372, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/siman-0446857.pdf

  8. Predictors of Success after Extracorporeal Shock Wave Lithotripsy (ESWL for Renal Calculi Between 20—30 mm: A Multivariate Analysis Model

    Directory of Open Access Journals (Sweden)

    Ahmed El-Assmy

    2006-01-01

    Full Text Available The first-line management of renal stones between 20—30 mm remains controversial. The Extracorporeal Shock Wave Lithotripsy (ESWL stone-free rates for such patient groups vary widely. The purpose of this study was to define factors that have a significant impact on the stone-free rate after ESWL in such controversial groups. Between January 1990 and January 2004, 594 patients with renal stones 20—30 mm in length underwent ESWL monotherapy. Stone surface area was measured for all stones. The results of treatment were evaluated after 3 months of follow-up. The stone-free rate was correlated with stone and patient characteristics using the Chi-square test; factors found to be significant were further analyzed using multivariate analysis.Repeat ESWL was needed in 56.9% of cases. Post-ESWL complications occurred in 5% of cases and post-ESWL secondary procedures were required in 5.9%. At 3-month follow-up, the overall stone-free rate was 77.2%. Using the Chi-square test, stone surface area, location, number, radiological renal picture, and congenital renal anomalies had a significant impact on the stone-free rate. Multivariate analysis excluded radiological renal picture from the logistic regression model while other factors maintained their statistically significant effect on success rate, indicating that they were independent predictors. A regression analysis model was designed to estimate the probability of stone-free status after ESWL. The sensitivity of the model was 97.4%, the specificity 90%, and the overall accuracy 95.6%.Stone surface area, location, number, and congenital renal anomalies are prognostic predictors determining stone clearance after ESWL of renal calculi of 20—30 mm. High probability of stone clearance is obtained with single stone ≤400 mm2 located in renal pelvis with no congenital anomalies. Our regression model can predict the probability of the success of ESWL in such controversial groups and can define patients who

  9. Applied multivariate statistics with R

    CERN Document Server

    Zelterman, Daniel

    2015-01-01

    This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...

  10. General Nature of Multicollinearity in Multiple Regression Analysis.

    Science.gov (United States)

    Liu, Richard

    1981-01-01

    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  11. Study of ionically modified water performance in carbonate reservoir system by multivariate data analysis

    DEFF Research Database (Denmark)

    Sohal, Muhammad Adeel Nassar; Kucheryavskiy, Sergey V.; Thyne, Geoffrey

    2017-01-01

    the critical mechanisms at the pore scale. Better pore scale physico-chemical understanding will guide to formulate accurate reservoir-scale models. This paper presents a comprehensive meta-analysis of the proposed mechanisms using multivariate data analysis. Detailed review of the subject, including...... mechanisms with supporting and contradictory evidence has been presented by Sohal et al. (2016). In this study, the significance of each contributing factor to EOR was quantified and subjected to rigorous multivariate statistical analysis. The analysis was limited because there is no uniform methodology...

  12. Prosthetic alignment after total knee replacement is not associated with dissatisfaction or change in Oxford Knee Score: A multivariable regression analysis.

    Science.gov (United States)

    Huijbregts, Henricus J T A M; Khan, Riaz J K; Fick, Daniel P; Jarrett, Olivia M; Haebich, Samantha

    2016-06-01

    Approximately 18% of the patients are dissatisfied with the result of total knee replacement. However, the relation between dissatisfaction and prosthetic alignment has not been investigated before. We retrospectively analysed prospectively gathered data of all patients who had a primary TKR, preoperative and one-year postoperative Oxford Knee Scores (OKS) and postoperative computed tomography (CT). The CT protocol measures hip-knee-ankle (HKA) angle, and coronal, sagittal and axial component alignment. Satisfaction was defined using a five-item Likert scale. We dichotomised dissatisfaction by combining '(very) dissatisfied' and 'neutral/not sure'. Associations with dissatisfaction and change in OKS were calculated using multivariable logistic and linear regression models. 230 TKRs were implanted in 105 men and 106 women. At one year, 12% were (very) dissatisfied and 10% neutral. Coronal alignment of the femoral component was 0.5 degrees more accurate in patients who were satisfied at one year. The other alignment measurements were not different between satisfied and dissatisfied patients. All radiographic measurements had a P-value>0.10 on univariate analyses. At one year, dissatisfaction was associated with the three-months OKS. Change in OKS was associated with three-months OKS, preoperative physical SF-12, preoperative pain and cruciate retaining design. Neither mechanical axis, nor component alignment, is associated with dissatisfaction at one year following TKR. Patients get the best outcome when pain reduction and function improvement are optimal during the first three months and when the indication to embark on surgery is based on physical limitations rather than on a high pain score. 2. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Analyses of polycyclic aromatic hydrocarbon (PAH) and chiral-PAH analogues-methyl-β-cyclodextrin guest-host inclusion complexes by fluorescence spectrophotometry and multivariate regression analysis.

    Science.gov (United States)

    Greene, LaVana; Elzey, Brianda; Franklin, Mariah; Fakayode, Sayo O

    2017-03-05

    The negative health impact of polycyclic aromatic hydrocarbons (PAHs) and differences in pharmacological activity of enantiomers of chiral molecules in humans highlights the need for analysis of PAHs and their chiral analogue molecules in humans. Herein, the first use of cyclodextrin guest-host inclusion complexation, fluorescence spectrophotometry, and chemometric approach to PAH (anthracene) and chiral-PAH analogue derivatives (1-(9-anthryl)-2,2,2-triflouroethanol (TFE)) analyses are reported. The binding constants (K b ), stoichiometry (n), and thermodynamic properties (Gibbs free energy (ΔG), enthalpy (ΔH), and entropy (ΔS)) of anthracene and enantiomers of TFE-methyl-β-cyclodextrin (Me-β-CD) guest-host complexes were also determined. Chemometric partial-least-square (PLS) regression analysis of emission spectra data of Me-β-CD-guest-host inclusion complexes was used for the determination of anthracene and TFE enantiomer concentrations in Me-β-CD-guest-host inclusion complex samples. The values of calculated K b and negative ΔG suggest the thermodynamic favorability of anthracene-Me-β-CD and enantiomeric of TFE-Me-β-CD inclusion complexation reactions. However, anthracene-Me-β-CD and enantiomer TFE-Me-β-CD inclusion complexations showed notable differences in the binding affinity behaviors and thermodynamic properties. The PLS regression analysis resulted in square-correlation-coefficients of 0.997530 or better and a low LOD of 3.81×10 -7 M for anthracene and 3.48×10 -8 M for TFE enantiomers at physiological conditions. Most importantly, PLS regression accurately determined the anthracene and TFE enantiomer concentrations with an average low error of 2.31% for anthracene, 4.44% for R-TFE and 3.60% for S-TFE. The results of the study are highly significant because of its high sensitivity and accuracy for analysis of PAH and chiral PAH analogue derivatives without the need of an expensive chiral column, enantiomeric resolution, or use of a polarized

  14. The studies of post-medieval glass by multivariate and X-ray fluorescence analysis

    International Nuclear Information System (INIS)

    Kierzek, J.; Kunicki-Goldfinger, J.

    2002-01-01

    Multivariate statistical analysis of the results obtained by energy dispersive X-ray fluorescence analysis has been used in the study of baroque vessel glasses originated from central Europe. X-ray spectrometry can be applied as a completely non-destructive, non-sampling and multi-element method. It is very useful in the studies of valuable historical artefacts. For the last years, multivariate statistical analysis has been developed as an important tool for the archaeometric purposes. Cluster, principal component and discriminant analysis were applied for the classification of the examined objects. The obtained results show that these statistical tools are very useful and complementary in the studies of historical objects. (author)

  15. Prediction of chemical, physical and sensory data from process parameters for frozen cod using multivariate analysis

    DEFF Research Database (Denmark)

    Bechmann, Iben Ellegaard; Jensen, H.S.; Bøknæs, Niels

    1998-01-01

    Physical, chemical and sensory quality parameters were determined for 115 cod (Gadus morhua) samples stored under varying frozen storage conditions. Five different process parameters (period of frozen storage, frozen storage. temperature, place of catch, season for catching and state of rigor) were...... varied systematically at two levels. The data obtained were evaluated using the multivariate methods, principal component analysis (PCA) and partial least squares (PLS) regression. The PCA models were used to identify which process parameters were actually most important for the quality of the frozen cod....... PLS models that were able to predict the physical, chemical and sensory quality parameters from the process parameters of the frozen raw material were generated. The prediction abilities of the PLS models were good enough to give reasonable results even when the process parameters were characterised...

  16. Multivariable analysis: a practical guide for clinicians and public health researchers

    National Research Council Canada - National Science Library

    Katz, Mitchell H

    2011-01-01

    .... It is the perfect introduction for all clinical researchers. It describes how to perform and interpret multivariable analysis, using plain language rather than complex derivations and mathematical formulae...

  17. Comprehensive drought characteristics analysis based on a nonlinear multivariate drought index

    Science.gov (United States)

    Yang, Jie; Chang, Jianxia; Wang, Yimin; Li, Yunyun; Hu, Hui; Chen, Yutong; Huang, Qiang; Yao, Jun

    2018-02-01

    It is vital to identify drought events and to evaluate multivariate drought characteristics based on a composite drought index for better drought risk assessment and sustainable development of water resources. However, most composite drought indices are constructed by the linear combination, principal component analysis and entropy weight method assuming a linear relationship among different drought indices. In this study, the multidimensional copulas function was applied to construct a nonlinear multivariate drought index (NMDI) to solve the complicated and nonlinear relationship due to its dependence structure and flexibility. The NMDI was constructed by combining meteorological, hydrological, and agricultural variables (precipitation, runoff, and soil moisture) to better reflect the multivariate variables simultaneously. Based on the constructed NMDI and runs theory, drought events for a particular area regarding three drought characteristics: duration, peak, and severity were identified. Finally, multivariate drought risk was analyzed as a tool for providing reliable support in drought decision-making. The results indicate that: (1) multidimensional copulas can effectively solve the complicated and nonlinear relationship among multivariate variables; (2) compared with single and other composite drought indices, the NMDI is slightly more sensitive in capturing recorded drought events; and (3) drought risk shows a spatial variation; out of the five partitions studied, the Jing River Basin as well as the upstream and midstream of the Wei River Basin are characterized by a higher multivariate drought risk. In general, multidimensional copulas provides a reliable way to solve the nonlinear relationship when constructing a comprehensive drought index and evaluating multivariate drought characteristics.

  18. Prediction of periodically correlated processes by wavelet transform and multivariate methods with applications to climatological data

    Science.gov (United States)

    Ghanbarzadeh, Mitra; Aminghafari, Mina

    2015-05-01

    This article studies the prediction of periodically correlated process using wavelet transform and multivariate methods with applications to climatological data. Periodically correlated processes can be reformulated as multivariate stationary processes. Considering this fact, two new prediction methods are proposed. In the first method, we use stepwise regression between the principal components of the multivariate stationary process and past wavelet coefficients of the process to get a prediction. In the second method, we propose its multivariate version without principal component analysis a priori. Also, we study a generalization of the prediction methods dealing with a deterministic trend using exponential smoothing. Finally, we illustrate the performance of the proposed methods on simulated and real climatological data (ozone amounts, flows of a river, solar radiation, and sea levels) compared with the multivariate autoregressive model. The proposed methods give good results as we expected.

  19. Predictors of success after extracorporeal shock wave lithotripsy (ESWL) for renal calculi between 20-30 mm: a multivariate analysis model.

    Science.gov (United States)

    El-Assmy, Ahmed; El-Nahas, Ahmed R; Abo-Elghar, Mohamed E; Eraky, Ibrahim; El-Kenawy, Mahmoud R; Sheir, Khaled Z

    2006-03-23

    The first-line management of renal stones between 20-30 mm remains controversial. The Extracorporeal Shock Wave Lithotripsy (ESWL) stone-free rates for such patient groups vary widely. The purpose of this study was to define factors that have a significant impact on the stone-free rate after ESWL in such controversial groups. Between January 1990 and January 2004, 594 patients with renal stones 20-30 mm in length underwent ESWL monotherapy. Stone surface area was measured for all stones. The results of treatment were evaluated after 3 months of follow-up. The stone-free rate was correlated with stone and patient characteristics using the Chi-square test; factors found to be significant were further analyzed using multivariate analysis. Repeat ESWL was needed in 56.9% of cases. Post-ESWL complications occurred in 5% of cases and post-ESWL secondary procedures were required in 5.9%. At 3-month follow-up, the overall stone-free rate was 77.2%. Using the Chi-square test, stone surface area, location, number, radiological renal picture, and congenital renal anomalies had a significant impact on the stone-free rate. Multivariate analysis excluded radiological renal picture from the logistic regression model while other factors maintained their statistically significant effect on success rate, indicating that they were independent predictors. A regression analysis model was designed to estimate the probability of stone-free status after ESWL. The sensitivity of the model was 97.4%, the specificity 90%, and the overall accuracy 95.6%. Stone surface area, location, number, and congenital renal anomalies are prognostic predictors determining stone clearance after ESWL of renal calculi of 20-30 mm. High probability of stone clearance is obtained with single stone ESWL in such controversial groups and can define patients who would need other treatment modality.

  20. Increased risk for complications following removal of hardware in patients with liver disease, pilon or pelvic fractures: A regression analysis.

    Science.gov (United States)

    Brown, Bryan D; Steinert, Justin N; Stelzer, John W; Yoon, Richard S; Langford, Joshua R; Koval, Kenneth J

    2017-12-01

    Indications for removing orthopedic hardware on an elective basis varies widely. Although viewed as a relatively benign procedure, there is a lack of data regarding overall complication rates after fracture fixation. The purpose of this study is to determine the overall short-term complication rate for elective removal of orthopedic hardware after fracture fixation and to identify associated risk factors. Adult patients indicated for elective hardware removal after fracture fixation between July 2012 and July 2016 were screened for inclusion. Inclusion criteria included patients with hardware related pain and/or impaired cosmesis with complete medical and radiographic records and at least 3-month follow-up. Exclusion criteria were those patients indicated for hardware removal for a diagnosis of malunion, non-union, and/or infection. Data collected included patient age, gender, anatomic location of hardware removed, body mass index, ASA score, and comorbidities. Overall complications, as well as complications requiring revision surgery were recorded. Statistical analysis was performed with SPSS 20.0, and included univariate and multivariate regression analysis. 391 patients (418 procedures) were included for analysis. Overall complication rates were 8.4%, with a 3.6% revision surgery rate. Univariate regression analysis revealed that patients who had liver disease were at significant risk for complication (p=0.001) and revision surgery (p=0.036). Multivariate regression analysis showed that: 1) patients who had liver disease were at significant risk of overall complication (p=0.001) and revision surgery (p=0.039); 2) Removal of hardware following fixation for a pilon had significantly increased risk for complication (p=0.012), but not revision surgery (p=0.43); and 3) Removal of hardware for pelvic fixation had a significantly increased risk for revision surgery (p=0.017). Removal of hardware following fracture fixation is not a risk-free procedure. Patients with

  1. Advanced multivariate analysis to assess remediation of hydrocarbons in soils.

    Science.gov (United States)

    Lin, Deborah S; Taylor, Peter; Tibbett, Mark

    2014-10-01

    Accurate monitoring of degradation levels in soils is essential in order to understand and achieve complete degradation of petroleum hydrocarbons in contaminated soils. We aimed to develop the use of multivariate methods for the monitoring of biodegradation of diesel in soils and to determine if diesel contaminated soils could be remediated to a chemical composition similar to that of an uncontaminated soil. An incubation experiment was set up with three contrasting soil types. Each soil was exposed to diesel at varying stages of degradation and then analysed for key hydrocarbons throughout 161 days of incubation. Hydrocarbon distributions were analysed by Principal Coordinate Analysis and similar samples grouped by cluster analysis. Variation and differences between samples were determined using permutational multivariate analysis of variance. It was found that all soils followed trajectories approaching the chemical composition of the unpolluted soil. Some contaminated soils were no longer significantly different to that of uncontaminated soil after 161 days of incubation. The use of cluster analysis allows the assignment of a percentage chemical similarity of a diesel contaminated soil to an uncontaminated soil sample. This will aid in the monitoring of hydrocarbon contaminated sites and the establishment of potential endpoints for successful remediation.

  2. A simple ergonomic measure reduces fluoroscopy time during ERCP: A multivariate analysis.

    Science.gov (United States)

    Jowhari, Fahd; Hopman, Wilma M; Hookey, Lawrence

    2017-03-01

    Background and study aims  Endoscopic retrograde cholangiopancreatgraphy (ERCP) carries a radiation risk to patients undergoing the procedure and the team performing it. Fluoroscopy time (FT) has been shown to have a linear relationship with radiation exposure during ERCP. Recent modifications to our ERCP suite design were felt to impact fluoroscopy time and ergonomics. This multivariate analysis was therefore undertaken to investigate these effects, and to identify and validate various clinical, procedural and ergonomic factors influencing the total fluoroscopy time during ERCP. This would better assist clinicians with predicting prolonged fluoroscopic durations and to undertake relevant precautions accordingly. Patients and methods  A retrospective analysis of 299 ERCPs performed by 4 endoscopists over an 18-month period, at a single tertiary care center was conducted. All inpatients/outpatients (121 males, 178 females) undergoing ERCP for any clinical indication from January 2012 to June 2013 in the chosen ERCP suite were included in the study. Various predetermined clinical, procedural and ergonomic factors were obtained via chart review. Univariate analyses identified factors to be included in the multivariate regression model with FT as the dependent variable. Results  Bringing the endoscopy and fluoroscopy screens next to each other was associated with a significantly lesser FT than when the screens were separated further (-1.4 min, P  = 0.026). Other significant factors associated with a prolonged FT included having a prior ERCP (+ 1.4 min, P  = 0.031), and more difficult procedures (+ 4.2 min for each level of difficulty, P  < 0.001). ERCPs performed by high-volume endoscopists used lesser FT vs. low-volume endoscopists (-1.82, P = 0.015). Conclusions  Our study has identified and validated various factors that affect the total fluoroscopy time during ERCP. This is the first study to show that decreasing the distance

  3. Singular spectrum analysis, Harmonic regression and El-Nino effect ...

    Indian Academy of Sciences (India)

    42

    Keywords: Total ozone; Singular Spectrum Analysis; Spatial interpolation; Multivariate ENSO .... needed for a whole gamut of activities that contribute to the ultimate synthesis ..... −0.0009 3 + 0.0581 2 − 1.0123 + 7.3246, 2 = 0.53…

  4. On two flexible methods of 2-dimensional regression analysis

    Czech Academy of Sciences Publication Activity Database

    Volf, Petr

    2012-01-01

    Roč. 18, č. 4 (2012), s. 154-164 ISSN 1803-9782 Grant - others:GA ČR(CZ) GAP209/10/2045 Institutional support: RVO:67985556 Keywords : regression analysis * Gordon surface * prediction error * projection pursuit Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2013/SI/volf-on two flexible methods of 2-dimensional regression analysis.pdf

  5. Multivariate statistical analysis of precipitation chemistry in Northwestern Spain

    International Nuclear Information System (INIS)

    Prada-Sanchez, J.M.; Garcia-Jurado, I.; Gonzalez-Manteiga, W.; Fiestras-Janeiro, M.G.; Espada-Rios, M.I.; Lucas-Dominguez, T.

    1993-01-01

    149 samples of rainwater were collected in the proximity of a power station in northwestern Spain at three rainwater monitoring stations. The resulting data are analyzed using multivariate statistical techniques. Firstly, the Principal Component Analysis shows that there are three main sources of pollution in the area (a marine source, a rural source and an acid source). The impact from pollution from these sources on the immediate environment of the stations is studied using Factorial Discriminant Analysis. 8 refs., 7 figs., 11 tabs

  6. Multivariate statistical analysis of precipitation chemistry in Northwestern Spain

    Energy Technology Data Exchange (ETDEWEB)

    Prada-Sanchez, J.M.; Garcia-Jurado, I.; Gonzalez-Manteiga, W.; Fiestras-Janeiro, M.G.; Espada-Rios, M.I.; Lucas-Dominguez, T. (University of Santiago, Santiago (Spain). Faculty of Mathematics, Dept. of Statistics and Operations Research)

    1993-07-01

    149 samples of rainwater were collected in the proximity of a power station in northwestern Spain at three rainwater monitoring stations. The resulting data are analyzed using multivariate statistical techniques. Firstly, the Principal Component Analysis shows that there are three main sources of pollution in the area (a marine source, a rural source and an acid source). The impact from pollution from these sources on the immediate environment of the stations is studied using Factorial Discriminant Analysis. 8 refs., 7 figs., 11 tabs.

  7. Classification of adulterated honeys by multivariate analysis.

    Science.gov (United States)

    Amiry, Saber; Esmaiili, Mohsen; Alizadeh, Mohammad

    2017-06-01

    In this research, honey samples were adulterated with date syrup (DS) and invert sugar syrup (IS) at three concentrations (7%, 15% and 30%). 102 adulterated samples were prepared in six batches with 17 replications for each batch. For each sample, 32 parameters including color indices, rheological, physical, and chemical parameters were determined. To classify the samples, based on type and concentrations of adulterant, a multivariate analysis was applied using principal component analysis (PCA) followed by a linear discriminant analysis (LDA). Then, 21 principal components (PCs) were selected in five sets. Approximately two-thirds were identified correctly using color indices (62.75%) or rheological properties (67.65%). A power discrimination was obtained using physical properties (97.06%), and the best separations were achieved using two sets of chemical properties (set 1: lactone, diastase activity, sucrose - 100%) (set 2: free acidity, HMF, ash - 95%). Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. The Inappropriate Symmetries of Multivariate Statistical Analysis in Geometric Morphometrics.

    Science.gov (United States)

    Bookstein, Fred L

    In today's geometric morphometrics the commonest multivariate statistical procedures, such as principal component analysis or regressions of Procrustes shape coordinates on Centroid Size, embody a tacit roster of symmetries -axioms concerning the homogeneity of the multiple spatial domains or descriptor vectors involved-that do not correspond to actual biological fact. These techniques are hence inappropriate for any application regarding which we have a-priori biological knowledge to the contrary (e.g., genetic/morphogenetic processes common to multiple landmarks, the range of normal in anatomy atlases, the consequences of growth or function for form). But nearly every morphometric investigation is motivated by prior insights of this sort. We therefore need new tools that explicitly incorporate these elements of knowledge, should they be quantitative, to break the symmetries of the classic morphometric approaches. Some of these are already available in our literature but deserve to be known more widely: deflated (spatially adaptive) reference distributions of Procrustes coordinates, Sewall Wright's century-old variant of factor analysis, the geometric algebra of importing explicit biomechanical formulas into Procrustes space. Other methods, not yet fully formulated, might involve parameterized models for strain in idealized forms under load, principled approaches to the separation of functional from Brownian aspects of shape variation over time, and, in general, a better understanding of how the formalism of landmarks interacts with the many other approaches to quantification of anatomy. To more powerfully organize inferences from the high-dimensional measurements that characterize so much of today's organismal biology, tomorrow's toolkit must rely neither on principal component analysis nor on the Procrustes distance formula, but instead on sound prior biological knowledge as expressed in formulas whose coefficients are not all the same. I describe the problems

  9. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  10. Denial-of-service attack detection based on multivariate correlation analysis

    NARCIS (Netherlands)

    Tan, Zhiyuan; Jamdagni, Aruna; He, Xiangjian; Nanda, Priyadarsi; Liu, Ren Ping; Lu, Bao-Liang; Zhang, Liqing; Kwok, James

    2011-01-01

    The reliability and availability of network services are being threatened by the growing number of Denial-of-Service (DoS) attacks. Effective mechanisms for DoS attack detection are demanded. Therefore, we propose a multivariate correlation analysis approach to investigate and extract second-order

  11. Soft sensor design by multivariate fusion of image features and process measurements

    DEFF Research Database (Denmark)

    Lin, Bao; Jørgensen, Sten Bay

    2011-01-01

    This paper presents a multivariate data fusion procedure for design of dynamic soft sensors where suitably selected image features are combined with traditional process measurements to enhance the performance of data-driven soft sensors. A key issue of fusing multiple sensor data, i.e. to determine...... with a multivariate analysis technique from RGB pictures. The color information is also transformed to hue, saturation and intensity components. Both sets of image features are combined with traditional process measurements to obtain an inferential model by partial least squares (PLS) regression. A dynamic PLS model...... oxides (NOx) emission of cement kilns. On-site tests demonstrate improved performance over soft sensors based on conventional process measurements only....

  12. Resting-state functional magnetic resonance imaging: the impact of regression analysis.

    Science.gov (United States)

    Yeh, Chia-Jung; Tseng, Yu-Sheng; Lin, Yi-Ru; Tsai, Shang-Yueh; Huang, Teng-Yi

    2015-01-01

    To investigate the impact of regression methods on resting-state functional magnetic resonance imaging (rsfMRI). During rsfMRI preprocessing, regression analysis is considered effective for reducing the interference of physiological noise on the signal time course. However, it is unclear whether the regression method benefits rsfMRI analysis. Twenty volunteers (10 men and 10 women; aged 23.4 ± 1.5 years) participated in the experiments. We used node analysis and functional connectivity mapping to assess the brain default mode network by using five combinations of regression methods. The results show that regressing the global mean plays a major role in the preprocessing steps. When a global regression method is applied, the values of functional connectivity are significantly lower (P ≤ .01) than those calculated without a global regression. This step increases inter-subject variation and produces anticorrelated brain areas. rsfMRI data processed using regression should be interpreted carefully. The significance of the anticorrelated brain areas produced by global signal removal is unclear. Copyright © 2014 by the American Society of Neuroimaging.

  13. Development of a User Interface for a Regression Analysis Software Tool

    Science.gov (United States)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.

  14. Method for nonlinear exponential regression analysis

    Science.gov (United States)

    Junkin, B. G.

    1972-01-01

    Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.

  15. Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

    Directory of Open Access Journals (Sweden)

    Biqing Cai

    2015-04-01

    Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

  16. Establishing a Mathematical Equations and Improving the Production of L-tert-Leucine by Uniform Design and Regression Analysis.

    Science.gov (United States)

    Jiang, Wei; Xu, Chao-Zhen; Jiang, Si-Zhi; Zhang, Tang-Duo; Wang, Shi-Zhen; Fang, Bai-Shan

    2017-04-01

    L-tert-Leucine (L-Tle) and its derivatives are extensively used as crucial building blocks for chiral auxiliaries, pharmaceutically active ingredients, and ligands. Combining with formate dehydrogenase (FDH) for regenerating the expensive coenzyme NADH, leucine dehydrogenase (LeuDH) is continually used for synthesizing L-Tle from α-keto acid. A multilevel factorial experimental design was executed for research of this system. In this work, an efficient optimization method for improving the productivity of L-Tle was developed. And the mathematical model between different fermentation conditions and L-Tle yield was also determined in the form of the equation by using uniform design and regression analysis. The multivariate regression equation was conveniently implemented in water, with a space time yield of 505.9 g L -1  day -1 and an enantiomeric excess value of >99 %. These results demonstrated that this method might become an ideal protocol for industrial production of chiral compounds and unnatural amino acids such as chiral drug intermediates.

  17. Regression of uveal malignant melanomas following cobalt-60 plaque. Correlates between acoustic spectrum analysis and tumor regression

    International Nuclear Information System (INIS)

    Coleman, D.J.; Lizzi, F.L.; Silverman, R.H.; Ellsworth, R.M.; Haik, B.G.; Abramson, D.H.; Smith, M.E.; Rondeau, M.J.

    1985-01-01

    Parameters derived from computer analysis of digital radio-frequency (rf) ultrasound scan data of untreated uveal malignant melanomas were examined for correlations with tumor regression following cobalt-60 plaque. Parameters included tumor height, normalized power spectrum and acoustic tissue type (ATT). Acoustic tissue type was based upon discriminant analysis of tumor power spectra, with spectra of tumors of known pathology serving as a model. Results showed ATT to be correlated with tumor regression during the first 18 months following treatment. Tumors with ATT associated with spindle cell malignant melanoma showed over twice the percentage reduction in height as those with ATT associated with mixed/epithelioid melanomas. Pre-treatment height was only weakly correlated with regression. Additionally, significant spectral changes were observed following treatment. Ultrasonic spectrum analysis thus provides a noninvasive tool for classification, prediction and monitoring of tumor response to cobalt-60 plaque

  18. Using the Logistic Regression model in supporting decisions of establishing marketing strategies

    Directory of Open Access Journals (Sweden)

    Cristinel CONSTANTIN

    2015-12-01

    Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations

  19. Hierarchical multivariate covariance analysis of metabolic connectivity.

    Science.gov (United States)

    Carbonell, Felix; Charil, Arnaud; Zijdenbos, Alex P; Evans, Alan C; Bedell, Barry J

    2014-12-01

    Conventional brain connectivity analysis is typically based on the assessment of interregional correlations. Given that correlation coefficients are derived from both covariance and variance, group differences in covariance may be obscured by differences in the variance terms. To facilitate a comprehensive assessment of connectivity, we propose a unified statistical framework that interrogates the individual terms of the correlation coefficient. We have evaluated the utility of this method for metabolic connectivity analysis using [18F]2-fluoro-2-deoxyglucose (FDG) positron emission tomography (PET) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. As an illustrative example of the utility of this approach, we examined metabolic connectivity in angular gyrus and precuneus seed regions of mild cognitive impairment (MCI) subjects with low and high β-amyloid burdens. This new multivariate method allowed us to identify alterations in the metabolic connectome, which would not have been detected using classic seed-based correlation analysis. Ultimately, this novel approach should be extensible to brain network analysis and broadly applicable to other imaging modalities, such as functional magnetic resonance imaging (MRI).

  20. Extracting information from two-dimensional electrophoresis gels by partial least squares regression

    DEFF Research Database (Denmark)

    Jessen, Flemming; Lametsch, R.; Bendixen, E.

    2002-01-01

    of all proteins/spots in the gels. In the present study it is demonstrated how information can be extracted by multivariate data analysis. The strategy is based on partial least squares regression followed by variable selection to find proteins that individually or in combination with other proteins vary......Two-dimensional gel electrophoresis (2-DE) produces large amounts of data and extraction of relevant information from these data demands a cautious and time consuming process of spot pattern matching between gels. The classical approach of data analysis is to detect protein markers that appear...... or disappear depending on the experimental conditions. Such biomarkers are found by comparing the relative volumes of individual spots in the individual gels. Multivariate statistical analysis and modelling of 2-DE data for comparison and classification is an alternative approach utilising the combination...

  1. Detecting overdispersion in count data: A zero-inflated Poisson regression analysis

    Science.gov (United States)

    Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah

    2017-09-01

    This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.

  2. An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Wen-Tsao Pan

    2016-01-01

    Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.

  3. Multivariate statistical analysis of major and trace element data for ...

    African Journals Online (AJOL)

    Multivariate statistical analysis of major and trace element data for niobium exploration in the peralkaline granites of the anorogenic ring-complex province of Nigeria. PO Ogunleye, EC Ike, I Garba. Abstract. No Abstract Available Journal of Mining and Geology Vol.40(2) 2004: 107-117. Full Text: EMAIL FULL TEXT EMAIL ...

  4. Extracting bb Higgs Decay Signals using Multivariate Techniques

    Energy Technology Data Exchange (ETDEWEB)

    Smith, W Clarke; /George Washington U. /SLAC

    2012-08-28

    For low-mass Higgs boson production at ATLAS at {radical}s = 7 TeV, the hard subprocess gg {yields} h{sup 0} {yields} b{bar b} dominates but is in turn drowned out by background. We seek to exploit the intrinsic few-MeV mass width of the Higgs boson to observe it above the background in b{bar b}-dijet mass plots. The mass resolution of existing mass-reconstruction algorithms is insufficient for this purpose due to jet combinatorics, that is, the algorithms cannot identify every jet that results from b{bar b} Higgs decay. We combine these algorithms using the neural net (NN) and boosted regression tree (BDT) multivariate methods in attempt to improve the mass resolution. Events involving gg {yields} h{sup 0} {yields} b{bar b} are generated using Monte Carlo methods with Pythia and then the Toolkit for Multivariate Analysis (TMVA) is used to train and test NNs and BDTs. For a 120 GeV Standard Model Higgs boson, the m{sub h{sup 0}}-reconstruction width is reduced from 8.6 to 6.5 GeV. Most importantly, however, the methods used here allow for more advanced m{sub h{sup 0}}-reconstructions to be created in the future using multivariate methods.

  5. Research and analyze of physical health using multiple regression analysis

    Directory of Open Access Journals (Sweden)

    T. S. Kyi

    2014-01-01

    Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.

  6. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  7. Support vector regression and artificial neural network models for stability indicating analysis of mebeverine hydrochloride and sulpiride mixtures in pharmaceutical preparation: A comparative study

    Science.gov (United States)

    Naguib, Ibrahim A.; Darwish, Hany W.

    2012-02-01

    A comparison between support vector regression (SVR) and Artificial Neural Networks (ANNs) multivariate regression methods is established showing the underlying algorithm for each and making a comparison between them to indicate the inherent advantages and limitations. In this paper we compare SVR to ANN with and without variable selection procedure (genetic algorithm (GA)). To project the comparison in a sensible way, the methods are used for the stability indicating quantitative analysis of mixtures of mebeverine hydrochloride and sulpiride in binary mixtures as a case study in presence of their reported impurities and degradation products (summing up to 6 components) in raw materials and pharmaceutical dosage form via handling the UV spectral data. For proper analysis, a 6 factor 5 level experimental design was established resulting in a training set of 25 mixtures containing different ratios of the interfering species. An independent test set consisting of 5 mixtures was used to validate the prediction ability of the suggested models. The proposed methods (linear SVR (without GA) and linear GA-ANN) were successfully applied to the analysis of pharmaceutical tablets containing mebeverine hydrochloride and sulpiride mixtures. The results manifest the problem of nonlinearity and how models like the SVR and ANN can handle it. The methods indicate the ability of the mentioned multivariate calibration models to deconvolute the highly overlapped UV spectra of the 6 components' mixtures, yet using cheap and easy to handle instruments like the UV spectrophotometer.

  8. Comparative multivariate analyses of transient otoacoustic emissions and distorsion products in normal and impaired hearing.

    Science.gov (United States)

    Stamate, Mirela Cristina; Todor, Nicolae; Cosgarea, Marcel

    2015-01-01

    The clinical utility of otoacoustic emissions as a noninvasive objective test of cochlear function has been long studied. Both transient otoacoustic emissions and distorsion products can be used to identify hearing loss, but to what extent they can be used as predictors for hearing loss is still debated. Most studies agree that multivariate analyses have better test performances than univariate analyses. The aim of the study was to determine transient otoacoustic emissions and distorsion products performance in identifying normal and impaired hearing loss, using the pure tone audiogram as a gold standard procedure and different multivariate statistical approaches. The study included 105 adult subjects with normal hearing and hearing loss who underwent the same test battery: pure-tone audiometry, tympanometry, otoacoustic emission tests. We chose to use the logistic regression as a multivariate statistical technique. Three logistic regression models were developed to characterize the relations between different risk factors (age, sex, tinnitus, demographic features, cochlear status defined by otoacoustic emissions) and hearing status defined by pure-tone audiometry. The multivariate analyses allow the calculation of the logistic score, which is a combination of the inputs, weighted by coefficients, calculated within the analyses. The accuracy of each model was assessed using receiver operating characteristics curve analysis. We used the logistic score to generate receivers operating curves and to estimate the areas under the curves in order to compare different multivariate analyses. We compared the performance of each otoacoustic emission (transient, distorsion product) using three different multivariate analyses for each ear, when multi-frequency gold standards were used. We demonstrated that all multivariate analyses provided high values of the area under the curve proving the performance of the otoacoustic emissions. Each otoacoustic emission test presented high

  9. Multivariate analysis of the heavy metal concentrations in the vegetable and soil samples-a case study from district charsadda and district mardan

    International Nuclear Information System (INIS)

    Idrees, M.; Rahman, Z.; Bibi, S.; Shah, F.; Gulab, H.; Ali, L.

    2017-01-01

    Multivariate statistical methods like cluster analysis, principal component analysis (PCA) and regression analysis were applied on the metals concentration of both the vegetables and soil samples collected from the districts Charsada and Mardan to classify them in different groups. The concentrations of seven heavy metals including Cu, Cr, Co, Ni, Ag, Pb, Sb were investigated using atomic absorption spectrophotometer (AAS) in the leaves, fruits, extracts, and soil samples of potato, colocasia, turnip, radish, cabbage, angular loofah, cucumber, bitter gourd, round melon, and pumpkin. The concentration of copper, chromium, cobalt, nickel, silver, lead and antimony were found in the range of 6.133-72.933, 32.233-92.5, 2.25-8.083, 0.366-143.53, 0.4-4.467, 11.916-157.96, 28.75-165.367 mgKg/sup -1/ respectively. (author)

  10. Functional data analysis of generalized regression quantiles

    KAUST Repository

    Guo, Mengmeng; Zhou, Lan; Huang, Jianhua Z.; Hä rdle, Wolfgang Karl

    2013-01-01

    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  11. Functional data analysis of generalized regression quantiles

    KAUST Repository

    Guo, Mengmeng

    2013-11-05

    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  12. Predicting Success in Product Development: The Application of Principal Component Analysis to Categorical Data and Binomial Logistic Regression

    Directory of Open Access Journals (Sweden)

    Glauco H.S. Mendes

    2013-09-01

    Full Text Available Critical success factors in new product development (NPD in the Brazilian small and medium enterprises (SMEs are identified and analyzed. Critical success factors are best practices that can be used to improve NPD management and performance in a company. However, the traditional method for identifying these factors is survey methods. Subsequently, the collected data are reduced through traditional multivariate analysis. The objective of this work is to develop a logistic regression model for predicting the success or failure of the new product development. This model allows for an evaluation and prioritization of resource commitments. The results will be helpful for guiding management actions, as one way to improve NPD performance in those industries.

  13. Analysis of Relationship Between Personality and Favorite Places with Poisson Regression Analysis

    Directory of Open Access Journals (Sweden)

    Yoon Song Ha

    2018-01-01

    Full Text Available A relationship between human personality and preferred locations have been a long conjecture for human mobility research. In this paper, we analyzed the relationship between personality and visiting place with Poisson Regression. Poisson Regression can analyze correlation between countable dependent variable and independent variable. For this analysis, 33 volunteers provided their personality data and 49 location categories data are used. Raw location data is preprocessed to be normalized into rates of visit and outlier data is prunned. For the regression analysis, independent variables are personality data and dependent variables are preprocessed location data. Several meaningful results are found. For example, persons with high tendency of frequent visiting to university laboratory has personality with high conscientiousness and low openness. As well, other meaningful location categories are presented in this paper.

  14. A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

    Directory of Open Access Journals (Sweden)

    Paulino José García Nieto

    2016-05-01

    Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.

  15. Multivariate NIR studies of seed-water interaction in Scots Pine Seeds (Pinus sylvestris L.)

    OpenAIRE

    Lestander, Torbjörn

    2003-01-01

    This thesis describes seed-water interaction using near infrared (NIR) spectroscopy, multivariate regression models and Scots pine seeds. The presented research covers classification of seed viability, prediction of seed moisture content, selection of NIR wavelengths and interpretation of seed-water interaction modelled and analysed by principal component analysis, ordinary least squares (OLS), partial least squares (PLS), bi-orthogonal least squares (BPLS) and genetic algorithms. The potenti...

  16. Multiple regression analysis of Jominy hardenability data for boron treated steels

    International Nuclear Information System (INIS)

    Komenda, J.; Sandstroem, R.; Tukiainen, M.

    1997-01-01

    The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)

  17. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

    Science.gov (United States)

    Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

    2018-03-01

    Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.

  18. Skeletal height estimation from regression analysis of sternal lengths in a Northwest Indian population of Chandigarh region: a postmortem study.

    Science.gov (United States)

    Singh, Jagmahender; Pathak, R K; Chavali, Krishnadutt H

    2011-03-20

    Skeletal height estimation from regression analysis of eight sternal lengths in the subjects of Chandigarh zone of Northwest India is the topic of discussion in this study. Analysis of eight sternal lengths (length of manubrium, length of mesosternum, combined length of manubrium and mesosternum, total sternal length and first four intercostals lengths of mesosternum) measured from 252 male and 91 female sternums obtained at postmortems revealed that mean cadaver stature and sternal lengths were more in North Indians and males than the South Indians and females. Except intercostal lengths, all the sternal lengths were positively correlated with stature of the deceased in both sexes (P regression analysis of sternal lengths was found more useful than the linear regression for stature estimation. Using multivariate regression analysis, the combined length of manubrium and mesosternum in both sexes and the length of manubrium along with 2nd and 3rd intercostal lengths of mesosternum in males were selected as best estimators of stature. Nonetheless, the stature of males can be predicted with SEE of 6.66 (R(2) = 0.16, r = 0.318) from combination of MBL+BL_3+LM+BL_2, and in females from MBL only, it can be estimated with SEE of 6.65 (R(2) = 0.10, r = 0.318), whereas from the multiple regression analysis of pooled data, stature can be known with SEE of 6.97 (R(2) = 0.387, r = 575) from the combination of MBL+LM+BL_2+TSL+BL_3. The R(2) and F-ratio were found to be statistically significant for almost all the variables in both the sexes, except 4th intercostal length in males and 2nd to 4th intercostal lengths in females. The 'major' sternal lengths were more useful than the 'minor' ones for stature estimation The universal regression analysis used by Kanchan et al. [39] when applied to sternal lengths, gave satisfactory estimates of stature for males only but female stature was comparatively better estimated from simple linear regressions. But they are not proposed for the

  19. Prediction of diffuse solar irradiance using machine learning and multivariable regression

    International Nuclear Information System (INIS)

    Lou, Siwei; Li, Danny H.W.; Lam, Joseph C.; Chan, Wilco W.H.

    2016-01-01

    Highlights: • 54.9% of the annual global irradiance is composed by its diffuse part in HK. • Hourly diffuse irradiance was predicted by accessible variables. • The importance of variable in prediction was assessed by machine learning. • Simple prediction equations were developed with the knowledge of variable importance. - Abstract: The paper studies the horizontal global, direct-beam and sky-diffuse solar irradiance data measured in Hong Kong from 2008 to 2013. A machine learning algorithm was employed to predict the horizontal sky-diffuse irradiance and conduct sensitivity analysis for the meteorological variables. Apart from the clearness index (horizontal global/extra atmospheric solar irradiance), we found that predictors including solar altitude, air temperature, cloud cover and visibility are also important in predicting the diffuse component. The mean absolute error (MAE) of the logistic regression using the aforementioned predictors was less than 21.5 W/m"2 and 30 W/m"2 for Hong Kong and Denver, USA, respectively. With the systematic recording of the five variables for more than 35 years, the proposed model would be appropriate to estimate of long-term diffuse solar radiation, study climate change and develope typical meteorological year in Hong Kong and places with similar climates.

  20. Interpretability of Multivariate Brain Maps in Linear Brain Decoding: Definition, and Heuristic Quantification in Multivariate Analysis of MEG Time-Locked Effects.

    Science.gov (United States)

    Kia, Seyed Mostafa; Vega Pons, Sandro; Weisz, Nathan; Passerini, Andrea

    2016-01-01

    Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. Linear classifiers are widely employed in the brain decoding paradigm to discriminate among experimental conditions. Then, the derived linear weights are visualized in the form of multivariate brain maps to further study spatio-temporal patterns of underlying neural activities. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed definition, we exemplify a heuristic for approximating the interpretability in multivariate analysis of evoked magnetoencephalography (MEG) responses. Third, we propose to combine the approximated interpretability and the generalization performance of the brain decoding into a new multi-objective criterion for model selection. Our results, for the simulated and real MEG data, show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms

  1. Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis

    NARCIS (Netherlands)

    Brindle, R. C.; Ginty, A. T.; Jones, A.; Phillips, A. C.; Roseboom, T. J.; Carroll, D.; Painter, R. C.; de Rooij, S. R.

    2016-01-01

    Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure

  2. Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India

    Science.gov (United States)

    Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.

    2015-12-01

    Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.

  3. Using Apparent Density of Paper from Hardwood Kraft Pulps to Predict Sheet Properties, based on Unsupervised Classification and Multivariable Regression Techniques

    Directory of Open Access Journals (Sweden)

    Ofélia Anjos

    2015-07-01

    Full Text Available Paper properties determine the product application potential and depend on the raw material, pulping conditions, and pulp refining. The aim of this study was to construct mathematical models that predict quantitative relations between the paper density and various mechanical and optical properties of the paper. A dataset of properties of paper handsheets produced with pulps of Acacia dealbata, Acacia melanoxylon, and Eucalyptus globulus beaten at 500, 2500, and 4500 revolutions was used. Unsupervised classification techniques were combined to assess the need to perform separated prediction models for each species, and multivariable regression techniques were used to establish such prediction models. It was possible to develop models with a high goodness of fit using paper density as the independent variable (or predictor for all variables except tear index and zero-span tensile strength, both dry and wet.

  4. On directional multiple-output quantile regression

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2011-01-01

    Roč. 102, č. 2 (2011), s. 193-212 ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant - others:Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf

  5. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  6. A Poisson-lognormal conditional-autoregressive model for multivariate spatial analysis of pedestrian crash counts across neighborhoods.

    Science.gov (United States)

    Wang, Yiyi; Kockelman, Kara M

    2013-11-01

    This work examines the relationship between 3-year pedestrian crash counts across Census tracts in Austin, Texas, and various land use, network, and demographic attributes, such as land use balance, residents' access to commercial land uses, sidewalk density, lane-mile densities (by roadway class), and population and employment densities (by type). The model specification allows for region-specific heterogeneity, correlation across response types, and spatial autocorrelation via a Poisson-based multivariate conditional auto-regressive (CAR) framework and is estimated using Bayesian Markov chain Monte Carlo methods. Least-squares regression estimates of walk-miles traveled per zone serve as the exposure measure. Here, the Poisson-lognormal multivariate CAR model outperforms an aspatial Poisson-lognormal multivariate model and a spatial model (without cross-severity correlation), both in terms of fit and inference. Positive spatial autocorrelation emerges across neighborhoods, as expected (due to latent heterogeneity or missing variables that trend in space, resulting in spatial clustering of crash counts). In comparison, the positive aspatial, bivariate cross correlation of severe (fatal or incapacitating) and non-severe crash rates reflects latent covariates that have impacts across severity levels but are more local in nature (such as lighting conditions and local sight obstructions), along with spatially lagged cross correlation. Results also suggest greater mixing of residences and commercial land uses is associated with higher pedestrian crash risk across different severity levels, ceteris paribus, presumably since such access produces more potential conflicts between pedestrian and vehicle movements. Interestingly, network densities show variable effects, and sidewalk provision is associated with lower severe-crash rates. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Elliptical multiple-output quantile regression and convex optimization

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Šiman, Miroslav

    2016-01-01

    Roč. 109, č. 1 (2016), s. 232-237 ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.540, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/siman-0458243.pdf

  8. Multivariate analysis between air pollutants and meteorological variables in Seoul

    International Nuclear Information System (INIS)

    Kim, J.; Lim, J.

    2005-01-01

    Multivariate analysis was conducted to analyze the relationship between air pollutants and meteorological variables measured in Seoul from January 1 to December 31, 1999. The first principal component showed the contrast effect between O 3 and the other pollutants. The second principal component showed the contrast effect between CO, SO 2 , NO 2 , and O 3 , PM 10 , TSP. Based on the cluster analysis, three clusters represented different air pollution levels, seasonal characteristics of air pollutants, and meteorological conditions. Discriminant analysis with air environment index (AEI) was carried out to develop an air pollution index function. (orig.)

  9. Statistical analysis of medical data using SAS

    CERN Document Server

    Der, Geoff

    2005-01-01

    An Introduction to SASDescribing and Summarizing DataBasic InferenceScatterplots Correlation: Simple Regression and SmoothingAnalysis of Variance and CovarianceMultiple RegressionLogistic RegressionThe Generalized Linear ModelGeneralized Additive ModelsNonlinear Regression ModelsThe Analysis of Longitudinal Data IThe Analysis of Longitudinal Data II: Models for Normal Response VariablesThe Analysis of Longitudinal Data III: Non-Normal ResponseSurvival AnalysisAnalysis Multivariate Date: Principal Components and Cluster AnalysisReferences

  10. Enhancing e-waste estimates: Improving data quality by multivariate Input–Output Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Feng, E-mail: fwang@unu.edu [Institute for Sustainability and Peace, United Nations University, Hermann-Ehler-Str. 10, 53113 Bonn (Germany); Design for Sustainability Lab, Faculty of Industrial Design Engineering, Delft University of Technology, Landbergstraat 15, 2628CE Delft (Netherlands); Huisman, Jaco [Institute for Sustainability and Peace, United Nations University, Hermann-Ehler-Str. 10, 53113 Bonn (Germany); Design for Sustainability Lab, Faculty of Industrial Design Engineering, Delft University of Technology, Landbergstraat 15, 2628CE Delft (Netherlands); Stevels, Ab [Design for Sustainability Lab, Faculty of Industrial Design Engineering, Delft University of Technology, Landbergstraat 15, 2628CE Delft (Netherlands); Baldé, Cornelis Peter [Institute for Sustainability and Peace, United Nations University, Hermann-Ehler-Str. 10, 53113 Bonn (Germany); Statistics Netherlands, Henri Faasdreef 312, 2492 JP Den Haag (Netherlands)

    2013-11-15

    Highlights: • A multivariate Input–Output Analysis method for e-waste estimates is proposed. • Applying multivariate analysis to consolidate data can enhance e-waste estimates. • We examine the influence of model selection and data quality on e-waste estimates. • Datasets of all e-waste related variables in a Dutch case study have been provided. • Accurate modeling of time-variant lifespan distributions is critical for estimate. - Abstract: Waste electrical and electronic equipment (or e-waste) is one of the fastest growing waste streams, which encompasses a wide and increasing spectrum of products. Accurate estimation of e-waste generation is difficult, mainly due to lack of high quality data referred to market and socio-economic dynamics. This paper addresses how to enhance e-waste estimates by providing techniques to increase data quality. An advanced, flexible and multivariate Input–Output Analysis (IOA) method is proposed. It links all three pillars in IOA (product sales, stock and lifespan profiles) to construct mathematical relationships between various data points. By applying this method, the data consolidation steps can generate more accurate time-series datasets from available data pool. This can consequently increase the reliability of e-waste estimates compared to the approach without data processing. A case study in the Netherlands is used to apply the advanced IOA model. As a result, for the first time ever, complete datasets of all three variables for estimating all types of e-waste have been obtained. The result of this study also demonstrates significant disparity between various estimation models, arising from the use of data under different conditions. It shows the importance of applying multivariate approach and multiple sources to improve data quality for modelling, specifically using appropriate time-varying lifespan parameters. Following the case study, a roadmap with a procedural guideline is provided to enhance e

  11. Enhancing e-waste estimates: Improving data quality by multivariate Input–Output Analysis

    International Nuclear Information System (INIS)

    Wang, Feng; Huisman, Jaco; Stevels, Ab; Baldé, Cornelis Peter

    2013-01-01

    Highlights: • A multivariate Input–Output Analysis method for e-waste estimates is proposed. • Applying multivariate analysis to consolidate data can enhance e-waste estimates. • We examine the influence of model selection and data quality on e-waste estimates. • Datasets of all e-waste related variables in a Dutch case study have been provided. • Accurate modeling of time-variant lifespan distributions is critical for estimate. - Abstract: Waste electrical and electronic equipment (or e-waste) is one of the fastest growing waste streams, which encompasses a wide and increasing spectrum of products. Accurate estimation of e-waste generation is difficult, mainly due to lack of high quality data referred to market and socio-economic dynamics. This paper addresses how to enhance e-waste estimates by providing techniques to increase data quality. An advanced, flexible and multivariate Input–Output Analysis (IOA) method is proposed. It links all three pillars in IOA (product sales, stock and lifespan profiles) to construct mathematical relationships between various data points. By applying this method, the data consolidation steps can generate more accurate time-series datasets from available data pool. This can consequently increase the reliability of e-waste estimates compared to the approach without data processing. A case study in the Netherlands is used to apply the advanced IOA model. As a result, for the first time ever, complete datasets of all three variables for estimating all types of e-waste have been obtained. The result of this study also demonstrates significant disparity between various estimation models, arising from the use of data under different conditions. It shows the importance of applying multivariate approach and multiple sources to improve data quality for modelling, specifically using appropriate time-varying lifespan parameters. Following the case study, a roadmap with a procedural guideline is provided to enhance e

  12. Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree

    Science.gov (United States)

    Heddam, Salim; Kisi, Ozgur

    2018-04-01

    In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.

  13. [Logistic regression model of noninvasive prediction for portal hypertensive gastropathy in patients with hepatitis B associated cirrhosis].

    Science.gov (United States)

    Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo

    2015-05-12

    To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.

  14. Multivariate analysis of data in sensory science

    CERN Document Server

    Naes, T; Risvik, E

    1996-01-01

    The state-of-the-art of multivariate analysis in sensory science is described in this volume. Both methods for aggregated and individual sensory profiles are discussed. Processes and results are presented in such a way that they can be understood not only by statisticians but also by experienced sensory panel leaders and users of sensory analysis. The techniques presented are focused on examples and interpretation rather than on the technical aspects, with an emphasis on new and important methods which are possibly not so well known to scientists in the field. Important features of the book are discussions on the relationship among the methods with a strong accent on the connection between problems and methods. All procedures presented are described in relation to sensory data and not as completely general statistical techniques. Sensory scientists, applied statisticians, chemometricians, those working in consumer science, food scientists and agronomers will find this book of value.

  15. Relationship between the curve of Spee and craniofacial variables: A regression analysis.

    Science.gov (United States)

    Halimi, Abdelali; Benyahia, Hicham; Azeroual, Mohamed-Faouzi; Bahije, Loubna; Zaoui, Fatima

    2018-06-01

    observed in the hyperdivergent population included in this study. For the multivariate analysis, the overbite and the sellion-articulare distance remained independently related to the curve of Spee according to the breathing type, Angle's classification, and overjet. This regression model explains 21.4% of the changes in the curve of Spee. Copyright © 2018. Published by Elsevier Masson SAS.

  16. Clinical patch test data evaluated by multivariate analysis. Danish Contact Dermatitis Group

    DEFF Research Database (Denmark)

    Christophersen, J; Menné, T; Tanghøj, P

    1989-01-01

    The aim of the present study was to evaluate the influence of individual explanatory factors, such as sex, age, atopy, test time and presence of diseased skin, on clinical patch test results, by application of multivariate statistical analysis. The study population was 2166 consecutive patients...... patch tested with the standard series of the International Contact Dermatitis Research Group (ICDRG) by members of the Danish Contact Dermatitis Group (DCDG) over a period of 6 months. For the 8 test allergens most often found positive (nickel, fragrance-mix, cobalt, chromate, balsam of Peru, carba......-mix, colophony, and formaldehyde), one or more individual factors were of significance for the risk of being sensitized, except for chromate and formaldehyde. It is concluded that patch test results can be compared only after stratification of the material or by multivariate analysis....

  17. Simultaneous determination of rifampicin, isoniazid and pyrazinamide in tablet preparations by multivariate spectrophotometric calibration.

    Science.gov (United States)

    Goicoechea, H C; Olivieri, A C

    1999-08-01

    The use of multivariate spectrophotometric calibration is presented for the simultaneous determination of the active components of tablets used in the treatment of pulmonary tuberculosis. The resolution of ternary mixtures of rifampicin, isoniazid and pyrazinamide has been accomplished by using partial least squares (PLS-1) regression analysis. Although the components show an important degree of spectral overlap, they have been simultaneously determined with high accuracy and precision, rapidly and with no need of nonaqueous solvents for dissolving the samples. No interference has been observed from the tablet excipients. A comparison is presented with the related multivariate method of classical least squares (CLS) analysis, which is shown to yield less reliable results due to the severe spectral overlap among the studied compounds. This is highlighted in the case of isoniazid, due to the small absorbances measured for this component.

  18. Decomposition of multivariate phenotypic means in multigroup genetic covarinace structure analysis

    NARCIS (Netherlands)

    Dolan, C.V.; Molenaar, P.C.M.; Boomsma, D.I.

    1992-01-01

    Uses D. Sorbom's (1974) method to study differences in latent means in multivariate twin data. By restricting the analysis to a comparison between groups, the results pertain only to the additive contributions of common genetic and environmental factors to the deviation of the group means from what

  19. Background stratified Poisson regression analysis of cohort data.

    Science.gov (United States)

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  20. Genetic Parameters for Body condition score, Body weigth, Milk yield and Fertility estimated using random regression models

    NARCIS (Netherlands)

    Berry, D.P.; Buckley, F.; Dillon, P.; Evans, R.D.; Rath, M.; Veerkamp, R.F.

    2003-01-01

    Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields

  1. Determinação de misturas de sulfametoxazol e trimetoprima por espectroscopia eletrônica multivariada Determination of sulfamethoxazole and trimethoprim mixtures by multivariate electronic spectroscopy

    Directory of Open Access Journals (Sweden)

    Gilcélia A. Cordeiro

    2008-01-01

    Full Text Available In this work a multivariate spectroscopic methodology is proposed for quantitative determination of sulfamethoxazole and trimethoprim in pharmaceutical associations. The multivariate model was developed by partial least-squares regression, using twenty synthetic mixtures and the spectral region between 190 and 350 nm. In the validation stage, which involved the analysis of five synthetic mixtures, prediction errors lower that 3% were observed. The predictive capacity of the multivariate models is seriously affected by spectral changes induced by pH variations, a fact that acquires a great significance in the analysis of real samples (pharmaceuticals that contain chemical additives.

  2. Simple and Multivariate Relationships Between Spiritual Intelligence with General Health and Happiness.

    Science.gov (United States)

    Amirian, Mohammad-Elyas; Fazilat-Pour, Masoud

    2016-08-01

    The present study examined simple and multivariate relationships of spiritual intelligence with general health and happiness. The employed method was descriptive and correlational. King's Spiritual Quotient scales, GHQ-28 and Oxford Happiness Inventory, are filled out by a sample consisted of 384 students, which were selected using stratified random sampling from the students of Shahid Bahonar University of Kerman. Data are subjected to descriptive and inferential statistics including correlations and multivariate regressions. Bivariate correlations support positive and significant predictive value of spiritual intelligence toward general health and happiness. Further analysis showed that among the Spiritual Intelligence' subscales, Existential Critical Thinking Predicted General Health and Happiness, reversely. In addition, happiness was positively predicted by generation of personal meaning and transcendental awareness. The findings are discussed in line with the previous studies and the relevant theoretical background.

  3. Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

    Science.gov (United States)

    Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh

    2015-12-01

    Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE

  4. On the Optimality of Multivariate S-Estimators

    NARCIS (Netherlands)

    Croux, C.; Dehon, C.; Yadine, A.

    2010-01-01

    In this paper we maximize the efficiency of a multivariate S-estimator under a constraint on the breakdown point. In the linear regression model, it is known that the highest possible efficiency of a maximum breakdown S-estimator is bounded above by 33% for Gaussian errors. We prove the surprising

  5. Precision Index in the Multivariate Context

    Czech Academy of Sciences Publication Activity Database

    Šiman, Miroslav

    2014-01-01

    Roč. 43, č. 2 (2014), s. 377-387 ISSN 0361-0926 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : data depth * multivariate quantile * process capability index * precision index * regression quantile Subject RIV: BA - General Mathematics Impact factor: 0.274, year: 2014 http://library.utia.cas.cz/separaty/2014/SI/siman-0425059.pdf

  6. Multivariate data analysis approach to understand magnetic properties of perovskite manganese oxides

    International Nuclear Information System (INIS)

    Imamura, N.; Mizoguchi, T.; Yamauchi, H.; Karppinen, M.

    2008-01-01

    Here we apply statistical multivariate data analysis techniques to obtain some insights into the complex structure-property relations in antiferromagnetic (AFM) and ferromagnetic (FM) manganese perovskite systems, AMnO 3 . The 131 samples included in the present analyses are described by 21 crystal-structure or crystal-chemical (CS/CC) parameters. Principal component analysis (PCA), carried out separately for the AFM and FM compounds, is used to model and evaluate the various relationships among the magnetic properties and the various CS/CC parameters. Moreover, for the AFM compounds, PLS (partial least squares projections to latent structures) analysis is performed so as to predict the magnitude of the Neel temperature on the bases of the CS/CC parameters. Finally, so-called PLS-DA (PLS discriminant analysis) method is employed to find out the most influential/characteristic CS/CC parameters that differentiate the two classes of compounds from each other. - Graphical abstract: Statistical multivariate data analysis techniques are applied to detect structure-property relations in antiferromagnetic (AFM) and ferromagnetic (FM) manganese perovskites. For AFM compounds, partial least squares projections to latent structures analysis predict the magnitude of the Neel temperature on the bases of structural parameters only. Moreover, AFM and FM compounds are well separated by means of so-called partial least squares discriminant analysis method

  7. Poisson Regression Analysis of Illness and Injury Surveillance Data

    Energy Technology Data Exchange (ETDEWEB)

    Frome E.L., Watkins J.P., Ellis E.D.

    2012-12-12

    The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra

  8. Identification of multivariate models for noise analysis of nuclear plant

    International Nuclear Information System (INIS)

    Zwingelstein, G.C.; Upadhyaya, B.R.

    1979-01-01

    During the normal operation of a pressurized water reactor, neutron noise analysis with multivariate autoregressive procedures in a valuable diagnostic tool to extract dynamic characteristics for incipient failure detection. The first part of the paper will describe in details the equations for estimating the multivariate autoregressive model matrices and the structure of various matrices. The matrices are estimated by solving a set of matrix operations, called Yule-Walker equations. The selection of optimal model order will also be discussed. Once the optimal parameter set is obtained, simple and fast calculations are used to determine the auto power spectral density, cross spectra, coherence function, phase. In addition the spectra may be decomposed into components being contributed from different noise sources. An application using neutron flux data collected on a nuclear plant will illustrate the efficiency of the method

  9. Multivariate analysis of factors predicting prostate dose in intensity-modulated radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Tomita, Tsuneyuki [Division of Radiology, Osaka Red Cross Hospital, Osaka (Japan); Nakamura, Mitsuhiro, E-mail: m_nkmr@kuhp.kyoto-u.ac.jp [Department of Radiation Oncology and Image-applied Therapy, Graduate School of Medicine, Kyoto University, Kyoto (Japan); Hirose, Yoshinori; Kitsuda, Kenji; Notogawa, Takuya; Miki, Katsuhito [Division of Radiology, Osaka Red Cross Hospital, Osaka (Japan); Nakamura, Kiyonao; Ishigaki, Takashi [Department of Radiation Oncology, Osaka Red Cross Hospital, Osaka (Japan)

    2014-01-01

    We conducted a multivariate analysis to determine relationships between prostate radiation dose and the state of surrounding organs, including organ volumes and the internal angle of the levator ani muscle (LAM), based on cone-beam computed tomography (CBCT) images after bone matching. We analyzed 270 CBCT data sets from 30 consecutive patients receiving intensity-modulated radiation therapy for prostate cancer. With patients in the supine position on a couch with the HipFix system, data for center of mass (COM) displacement of the prostate and the state of individual organs were acquired and compared between planning CT and CBCT scans. Dose distributions were then recalculated based on CBCT images. The relative effects of factors on the variance in COM, dose covering 95% of the prostate volume (D{sub 95%}), and percentage of prostate volume covered by the 100% isodose line (V{sub 100%}) were evaluated by a backward stepwise multiple regression analysis. COM displacement in the anterior-posterior direction (COM{sub AP}) correlated significantly with the rectum volume (δVr) and the internal LAM angle (δθ; R = 0.63). Weak correlations were seen for COM in the left-right (R = 0.18) and superior-inferior directions (R = 0.31). Strong correlations between COM{sub AP} and prostate D{sub 95%} and V{sub 100%} were observed (R ≥ 0.69). Additionally, the change ratios in δVr and δθ remained as predictors of prostate D{sub 95%} and V{sub 100%}. This study shows statistically that maintaining the same rectum volume and LAM state for both the planning CT simulation and treatment is important to ensure the correct prostate dose in the supine position with bone matching.

  10. Function approximation with polynomial regression slines

    International Nuclear Information System (INIS)

    Urbanski, P.

    1996-01-01

    Principles of the polynomial regression splines as well as algorithms and programs for their computation are presented. The programs prepared using software package MATLAB are generally intended for approximation of the X-ray spectra and can be applied in the multivariate calibration of radiometric gauges. (author)

  11. A kernel version of multivariate alteration detection

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Vestergaard, Jacob Schack

    2013-01-01

    Based on the established methods kernel canonical correlation analysis and multivariate alteration detection we introduce a kernel version of multivariate alteration detection. A case study with SPOT HRV data shows that the kMAD variates focus on extreme change observations.......Based on the established methods kernel canonical correlation analysis and multivariate alteration detection we introduce a kernel version of multivariate alteration detection. A case study with SPOT HRV data shows that the kMAD variates focus on extreme change observations....

  12. A Quality Assessment Tool for Non-Specialist Users of Regression Analysis

    Science.gov (United States)

    Argyrous, George

    2015-01-01

    This paper illustrates the use of a quality assessment tool for regression analysis. It is designed for non-specialist "consumers" of evidence, such as policy makers. The tool provides a series of questions such consumers of evidence can ask to interrogate regression analysis, and is illustrated with reference to a recent study published…

  13. Multivariate return periods in hydrology: a critical and practical review focusing on synthetic design hydrograph estimation

    Directory of Open Access Journals (Sweden)

    B. Gräler

    2013-04-01

    Full Text Available Most of the hydrological and hydraulic studies refer to the notion of a return period to quantify design variables. When dealing with multiple design variables, the well-known univariate statistical analysis is no longer satisfactory, and several issues challenge the practitioner. How should one incorporate the dependence between variables? How should a multivariate return period be defined and applied in order to yield a proper design event? In this study an overview of the state of the art for estimating multivariate design events is given and the different approaches are compared. The construction of multivariate distribution functions is done through the use of copulas, given their practicality in multivariate frequency analyses and their ability to model numerous types of dependence structures in a flexible way. A synthetic case study is used to generate a large data set of simulated discharges that is used for illustrating the effect of different modelling choices on the design events. Based on different uni- and multivariate approaches, the design hydrograph characteristics of a 3-D phenomenon composed of annual maximum peak discharge, its volume, and duration are derived. These approaches are based on regression analysis, bivariate conditional distributions, bivariate joint distributions and Kendall distribution functions, highlighting theoretical and practical issues of multivariate frequency analysis. Also an ensemble-based approach is presented. For a given design return period, the approach chosen clearly affects the calculated design event, and much attention should be given to the choice of the approach used as this depends on the real-world problem at hand.

  14. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review

    International Nuclear Information System (INIS)

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-01-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0–0.10 m, or 0–0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component

  15. MULTIVARIATE ANALYSIS OF RISK FACTORS FOR PREMATURITY IN SOUTHERN BRAZIL

    Directory of Open Access Journals (Sweden)

    Willian Augusto de Melo

    2014-05-01

    Full Text Available This study assessed the risk factors associated with preterm birth through a cross-sectional study in 4,440 newborns. Examined the factors associated between maternal sociodemographic variables (age, marital status, education and occupation, obstetric (pregnancy and delivery type and number of prenatal visits and neonatal (sex, race/color, birth weight and Apgar. Data were analyzed by multivariate logistic regression technique. Among the 480 (10.8% preterm risk factors were prevalent type of pregnancy (OR=6.48, number of prenatal visits (OR=2.09, Apgar score at first (OR=2.00 and fifth minute (OR=2.14 and birth weight (OR=31.8 indicating that these variables are directly associated with the occurrence of prematurity. The identification of risk factors should be the object of attention of health professionals and services to support effective measures to promote health to the general population; especially for women in fertile included some criteria of gestational risk.

  16. Bayesian nonlinear regression for large small problems

    KAUST Repository

    Chakraborty, Sounak; Ghosh, Malay; Mallick, Bani K.

    2012-01-01

    Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik's ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.

  17. Bayesian nonlinear regression for large small problems

    KAUST Repository

    Chakraborty, Sounak

    2012-07-01

    Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik\\'s ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.

  18. Background stratified Poisson regression analysis of cohort data

    International Nuclear Information System (INIS)

    Richardson, David B.; Langholz, Bryan

    2012-01-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. (orig.)

  19. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  20. Factors that impact the outcome of endoscopic correction of vesicoureteral reflux: a multivariate analysis.

    Science.gov (United States)

    Kajbafzadeh, Abdol-Mohammad; Tourchi, Ali; Aryan, Zahra

    2013-02-01

    To identify independent factors that may predict vesicoureteral reflux (VUR) resolution after endoscopic treatment using dextranomer/hyaluronic acid copolymer (Deflux) in children free of anatomical anomalies. A retrospective study was conducted in our pediatric referral center from 1998 to 2011 on children with primary VUR who underwent endoscopic injection of Deflux with or without concomitant autologous blood injection (called HABIT or HIT, respectively). Children with secondary VUR or incomplete records were excluded from the study. Potential factors were divided into three categories including preoperative, intraoperative and postoperative. Success was defined as no sign of VUR on postoperative voiding cystourethrogram. Univariate and multivariate logistic regression models were constructed to identify independent factors that may predict success. Odds ratio (OR) and 95 % confidence interval (95 % CI) for prediction of success were estimated for each factor. From 485 children received Deflux injection, a total of 372 with a mean age of 3.10 years (ranged from 6 months to 12 years) were included in the study and endoscopic management was successful in 322 (86.6 %) of them. Of the patients, 185 (49.7 %) underwent HIT and 187 (50.3 %) underwent HABIT technique. On univariate analysis, VUR grade from preoperative category (OR = 4.79, 95 % CI = 2.22-10.30, p = 0.000), operation technique (OR = 0.33, 95 % CI = 0.17-0.64, p = 0.001) and presence of mound on postoperative sonography (OR = 0.06, 95 % CI = 0.02-0.16, p = 0.000) were associated with success. On multivariate analysis, preoperative VUR grade (OR = 4.85, 95 % CI = 2.49-8.96, p = 0.000) and identification of mound on postoperative sonography (OR = 0.07, 95 % CI = 0.01-0.18, p = 0.000) remained as independent success predictors. Based on this study, successful VUR correction after the endoscopic injection of Deflux can be predicted with respect to preoperative VUR grade and presence of mound after operation.

  1. Multivariate analysis on unilateral cleft lip and palate treatment outcome by EUROCRAN index: A retrospective study.

    Science.gov (United States)

    Yew, Ching Ching; Alam, Mohammad Khursheed; Rahman, Shaifulizan Abdul

    2016-10-01

    This study is to evaluate the dental arch relationship and palatal morphology of unilateral cleft lip and palate patients by using EUROCRAN index, and to assess the factors that affect them using multivariate statistical analysis. A total of one hundred and seven patients from age five to twelve years old with non-syndromic unilateral cleft lip and palate were included in the study. These patients have received cheiloplasty and one stage palatoplasty surgery but yet to receive alveolar bone grafting procedure. Five assessors trained in the use of the EUROCRAN index underwent calibration exercise and ranked the dental arch relationships and palatal morphology of the patients' study models. For intra-rater agreement, the examiners scored the models twice, with two weeks interval in between sessions. Variable factors of the patients were collected and they included gender, site, type and, family history of unilateral cleft lip and palate; absence of lateral incisor on cleft side, cheiloplasty and palatoplasty technique used. Associations between various factors and dental arch relationships were assessed using logistic regression analysis. Dental arch relationship among unilateral cleft lip and palate in local population had relatively worse scoring than other parts of the world. Crude logistics regression analysis did not demonstrate any significant associations among the various socio-demographic factors, cheiloplasty and palatoplasty techniques used with the dental arch relationship outcome. This study has limitations that might have affected the results, example: having multiple operators performing the surgeries and the inability to access the influence of underlying genetic predisposed cranio-facial variability. These may have substantial influence on the treatment outcome. The factors that can affect unilateral cleft lip and palate treatment outcome is multifactorial in nature and remained controversial in general. Copyright © 2016 Elsevier Ireland Ltd. All

  2. Multivariate statistical methods a first course

    CERN Document Server

    Marcoulides, George A

    2014-01-01

    Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin

  3. Economic viability in concrete dams by multivariable regression tool for implantation of small hydroelectric plants

    International Nuclear Information System (INIS)

    Lima, Reginaldo Agapito de; Ribeiro Junior, Leopoldo Uberto

    2010-01-01

    For implantation of a SHP, the barrage is the main structure where its sizing represents from 30% - 50% of general cost of civil works. Considering this it is very important to have a fast, didactic and accurate tool for elaborating a budget, also allowing a quantitative analysis of inherent cost for civil building of barrages concrete made for small hydropower plants. In face of this, the multi changing regression tool is very important as it allows a fast and correct establishing of preliminary costs, even approximate, for estimates of barrages in concrete cost, enabling to ease the budget, guiding feasibility decisions for selecting or neglecting new alternatives of fall. (author)

  4. An Introduction to Applied Multivariate Analysis

    CERN Document Server

    Raykov, Tenko

    2008-01-01

    Focuses on the core multivariate statistics topics which are of fundamental relevance for its understanding. This book emphasis on the topics that are critical to those in the behavioral, social, and educational sciences.

  5. Multivariate time series analysis with R and financial applications

    CERN Document Server

    Tsay, Ruey S

    2013-01-01

    Since the publication of his first book, Analysis of Financial Time Series, Ruey Tsay has become one of the most influential and prominent experts on the topic of time series. Different from the traditional and oftentimes complex approach to multivariate (MV) time series, this sequel book emphasizes structural specification, which results in simplified parsimonious VARMA modeling and, hence, eases comprehension. Through a fundamental balance between theory and applications, the book supplies readers with an accessible approach to financial econometric models and their applications to real-worl

  6. Testing Mean Differences among Groups: Multivariate and Repeated Measures Analysis with Minimal Assumptions.

    Science.gov (United States)

    Bathke, Arne C; Friedrich, Sarah; Pauly, Markus; Konietschke, Frank; Staffen, Wolfgang; Strobl, Nicolas; Höller, Yvonne

    2018-03-22

    To date, there is a lack of satisfactory inferential techniques for the analysis of multivariate data in factorial designs, when only minimal assumptions on the data can be made. Presently available methods are limited to very particular study designs or assume either multivariate normality or equal covariance matrices across groups, or they do not allow for an assessment of the interaction effects across within-subjects and between-subjects variables. We propose and methodologically validate a parametric bootstrap approach that does not suffer from any of the above limitations, and thus provides a rather general and comprehensive methodological route to inference for multivariate and repeated measures data. As an example application, we consider data from two different Alzheimer's disease (AD) examination modalities that may be used for precise and early diagnosis, namely, single-photon emission computed tomography (SPECT) and electroencephalogram (EEG). These data violate the assumptions of classical multivariate methods, and indeed classical methods would not have yielded the same conclusions with regards to some of the factors involved.

  7. Directional quantile regression in R

    Czech Academy of Sciences Publication Activity Database

    Boček, Pavel; Šiman, Miroslav

    2017-01-01

    Roč. 53, č. 3 (2017), s. 480-492 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * regression quantile * halfspace depth * depth contour Subject RIV: BD - Theory of Information OBOR OECD: Applied mathematics Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0476587.pdf

  8. Management of Industrial Performance Indicators: Regression Analysis and Simulation

    Directory of Open Access Journals (Sweden)

    Walter Roberto Hernandez Vergara

    2017-11-01

    Full Text Available Stochastic methods can be used in problem solving and explanation of natural phenomena through the application of statistical procedures. The article aims to associate the regression analysis and systems simulation, in order to facilitate the practical understanding of data analysis. The algorithms were developed in Microsoft Office Excel software, using statistical techniques such as regression theory, ANOVA and Cholesky Factorization, which made it possible to create models of single and multiple systems with up to five independent variables. For the analysis of these models, the Monte Carlo simulation and analysis of industrial performance indicators were used, resulting in numerical indices that aim to improve the goals’ management for compliance indicators, by identifying systems’ instability, correlation and anomalies. The analytical models presented in the survey indicated satisfactory results with numerous possibilities for industrial and academic applications, as well as the potential for deployment in new analytical techniques.

  9. Least-Squares Linear Regression and Schrodinger's Cat: Perspectives on the Analysis of Regression Residuals.

    Science.gov (United States)

    Hecht, Jeffrey B.

    The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…

  10. Estimation of failure criteria in multivariate sensory shelf life testing using survival analysis.

    Science.gov (United States)

    Giménez, Ana; Gagliardi, Andrés; Ares, Gastón

    2017-09-01

    For most food products, shelf life is determined by changes in their sensory characteristics. A predetermined increase or decrease in the intensity of a sensory characteristic has frequently been used to signal that a product has reached the end of its shelf life. Considering all attributes change simultaneously, the concept of multivariate shelf life allows a single measurement of deterioration that takes into account all these sensory changes at a certain storage time. The aim of the present work was to apply survival analysis to estimate failure criteria in multivariate sensory shelf life testing using two case studies, hamburger buns and orange juice, by modelling the relationship between consumers' rejection of the product and the deterioration index estimated using PCA. In both studies, a panel of 13 trained assessors evaluated the samples using descriptive analysis whereas a panel of 100 consumers answered a "yes" or "no" question regarding intention to buy or consume the product. PC1 explained the great majority of the variance, indicating all sensory characteristics evolved similarly with storage time. Thus, PC1 could be regarded as index of sensory deterioration and a single failure criterion could be estimated through survival analysis for 25 and 50% consumers' rejection. The proposed approach based on multivariate shelf life testing may increase the accuracy of shelf life estimations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Resent state and multivariate analysis of a few juniper forests of baluchistan, pakistan

    International Nuclear Information System (INIS)

    Ahmed, M.; Siddiqui, M.F.

    2015-01-01

    Quantitative multivariate investigations were carried out to explore various forms of Juniper trees resulting human disturbances and natural phenomenon. Thirty stands were sampled by point centered quarter method and data were analysed using Wards cluster analysis and Bray-Curtis ordination. On the basis of multivariate analysis eight various forms i.e. healthy, unhealthy, over mature, disturbed, dieback, standing dead, logs and cut stem were recognized. Structural attributes were computed. Highest numbers (130-133 stem ha-1) of logs were recorded from Cautair and Khunk forests. Highest density ha-1 (229 ha-1) of healthy plants was estimated from Tangi Top area while lowest number (24 ha-1) of healthy plants was found from Saraghara area. Multivariate analysis showed five groups in cluster and ordination diagrams. These groups are characterized on the basis of healthy, over mature, disturbed and logged trees of Juniper. Higher number (115, 96, 84, 80 ha-1) of disturbed trees were distributed at Speena Sukher, Srag Kazi, Prang Shella and Tangi Top respectively. Overall density does not show any significant relation with basal area m2 ha-1, degree of slopes and the elevation of the sampling stands. Present study show that each and every Juniper stands are highly disturbed mostly due to human influence, therefore prompt conservational steps should be taken to safe these forests. (author)

  12. Safety and effectiveness of olanzapine in monotherapy: a multivariate analysis of a naturalistic study.

    Science.gov (United States)

    Ciudad, Antonio; Gutiérrez, Miguel; Cañas, Fernando; Gibert, Juan; Gascón, Josep; Carrasco, José-Luis; Bobes, Julio; Gómez, Juan-Carlos; Alvarez, Enrique

    2005-07-01

    This study investigated safety and effectiveness of olanzapine in monotherapy compared with conventional antipsychotics in treatment of acute inpatients with schizophrenia. This was a prospective, comparative, nonrandomized, open-label, multisite, observational study of Spanish inpatients with an acute episode of schizophrenia. Data included safety assessments with an extrapyramidal symptoms (EPS) questionnaire and the report of spontaneous adverse events, plus clinical assessments with the Brief Psychiatric Rating Scale (BPRS) and the Clinical Global Impressions-Severity of Illness (CGI-S). A multivariate methodology was used to more adequately determine which factors can influence safety and effectiveness of olanzapine in monotherapy. 339 patients treated with olanzapine in monotherapy (OGm) and 385 patients treated with conventional antipsychotics (CG) were included in the analysis. Treatment-emergent EPS were significantly higher in the CG (pOGm (p=0.005). Logistic regression analyses revealed that the only variable significantly correlated with treatment-emergent EPS and clinical response was treatment strategy, with patients in OGm having 1.5 times the probability of obtaining a clinical response and patients in CG having 5 times the risk of developing EPS. In this naturalistic study olanzapine in monotherapy was better-tolerated and at least as effective as conventional antipsychotics.

  13. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  14. Field applications of stand-off sensing using visible/NIR multivariate optical computing

    Science.gov (United States)

    Eastwood, DeLyle; Soyemi, Olusola O.; Karunamuni, Jeevanandra; Zhang, Lixia; Li, Hongli; Myrick, Michael L.

    2001-02-01

    12 A novel multivariate visible/NIR optical computing approach applicable to standoff sensing will be demonstrated with porphyrin mixtures as examples. The ultimate goal is to develop environmental or counter-terrorism sensors for chemicals such as organophosphorus (OP) pesticides or chemical warfare simulants in the near infrared spectral region. The mathematical operation that characterizes prediction of properties via regression from optical spectra is a calculation of inner products between the spectrum and the pre-determined regression vector. The result is scaled appropriately and offset to correspond to the basis from which the regression vector is derived. The process involves collecting spectroscopic data and synthesizing a multivariate vector using a pattern recognition method. Then, an interference coating is designed that reproduces the pattern of the multivariate vector in its transmission or reflection spectrum, and appropriate interference filters are fabricated. High and low refractive index materials such as Nb2O5 and SiO2 are excellent choices for the visible and near infrared regions. The proof of concept has now been established for this system in the visible and will later be extended to chemicals such as OP compounds in the near and mid-infrared.

  15. Enforcing Co-expression Within a Brain-Imaging Genomics Regression Framework.

    Science.gov (United States)

    Zille, Pascal; Calhoun, Vince D; Wang, Yu-Ping

    2017-06-28

    Among the challenges arising in brain imaging genetic studies, estimating the potential links between neurological and genetic variability within a population is key. In this work, we propose a multivariate, multimodal formulation for variable selection that leverages co-expression patterns across various data modalities. Our approach is based on an intuitive combination of two widely used statistical models: sparse regression and canonical correlation analysis (CCA). While the former seeks multivariate linear relationships between a given phenotype and associated observations, the latter searches to extract co-expression patterns between sets of variables belonging to different modalities. In the following, we propose to rely on a 'CCA-type' formulation in order to regularize the classical multimodal sparse regression problem (essentially incorporating both CCA and regression models within a unified formulation). The underlying motivation is to extract discriminative variables that are also co-expressed across modalities. We first show that the simplest formulation of such model can be expressed as a special case of collaborative learning methods. After discussing its limitation, we propose an extended, more flexible formulation, and introduce a simple and efficient alternating minimization algorithm to solve the associated optimization problem.We explore the parameter space and provide some guidelines regarding parameter selection. Both the original and extended versions are then compared on a simple toy dataset and a more advanced simulated imaging genomics dataset in order to illustrate the benefits of the latter. Finally, we validate the proposed formulation using single nucleotide polymorphisms (SNP) data and functional magnetic resonance imaging (fMRI) data from a population of adolescents (n = 362 subjects, age 16.9 ± 1.9 years from the Philadelphia Neurodevelopmental Cohort) for the study of learning ability. Furthermore, we carry out a significance

  16. Predicting Dropouts of University Freshmen: A Logit Regression Analysis.

    Science.gov (United States)

    Lam, Y. L. Jack

    1984-01-01

    Stepwise discriminant analysis coupled with logit regression analysis of freshmen data from Brandon University (Manitoba) indicated that six tested variables drawn from research on university dropouts were useful in predicting attrition: student status, residence, financial sources, distance from home town, goal fulfillment, and satisfaction with…

  17. Simulation Experiments in Practice : Statistical Design and Regression Analysis

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    2007-01-01

    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic

  18. Multivariate covariance generalized linear models

    DEFF Research Database (Denmark)

    Bonat, W. H.; Jørgensen, Bent

    2016-01-01

    are fitted by using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of types of response variables and covariance structures, including multivariate extensions......We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link...... function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated...

  19. Multivariate data analysis of enzyme production for hydrolysis purposes

    DEFF Research Database (Denmark)

    Schmidt, A.S.; Suhr, K.I.

    1999-01-01

    of the structure in the data - possibly combined with analysis of variance (ANOVA). Partial least squares regression (PLSR) showed a clear connection between the two differentdata matrices (the fermentation variables and the hydrolysis variables). Hence, PLSR was suitable for prediction purposes. The hydrolysis...

  20. C-reactive protein gene polymorphisms and myocardial infarction risk: a meta-analysis and meta-regression.

    Science.gov (United States)

    Zhu, Yanbin; Liu, Tongku; He, Haitao; Sun, Yuqing; Zhuo, Fengling

    2013-12-01

    C-reactive protein (CRP), the classic acute-phase protein, plays an important role in the etiology of myocardial infarction (MI). Emerging evidence has shown that the common polymorphisms in the CRP gene may influence an individual's susceptibility to MI; but individually published studies showed inconclusive results. This meta-analysis aimed to derive a more precise estimation of the associations between CRP gene polymorphisms and MI risk. A literature search of PubMed, Embase, Web of Science, and China BioMedicine (CBM) databases was conducted on articles published before June 1st, 2013. Crude odds ratio (OR) with 95% confidence interval (CI) were calculated. Nine case-control studies were included with a total of 2992 MI patients and 4711 healthy controls. The meta-analysis results indicated that CRP rs3093059 (T>C) polymorphism was associated with decreased risk of MI, especially among Asian populations. However, similar associations were not observed in CRP rs1800947 (G>C) and rs2794521 (G>A) polymorphisms (all p>0.05) among both Asian and Caucasian populations. Univariate and multivariate meta-regression analyses showed that ethnicity may be a major source of heterogeneity. No publication bias was detected in this meta-analysis. In conclusion, the current meta-analysis indicates that CRP rs3093059 (T>C) polymorphism may be associated with decreased risk of MI, especially among Asian populations.

  1. Quality of life in breast cancer patients--a quantile regression analysis.

    Science.gov (United States)

    Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

    2008-01-01

    Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.

  2. Multivariate analysis of eigenvalues and eigenvectors in tensor based morphometry

    Science.gov (United States)

    Rajagopalan, Vidya; Schwartzman, Armin; Hua, Xue; Leow, Alex; Thompson, Paul; Lepore, Natasha

    2015-01-01

    We develop a new algorithm to compute voxel-wise shape differences in tensor-based morphometry (TBM). As in standard TBM, we non-linearly register brain T1-weighed MRI data from a patient and control group to a template, and compute the Jacobian of the deformation fields. In standard TBM, the determinants of the Jacobian matrix at each voxel are statistically compared between the two groups. More recently, a multivariate extension of the statistical analysis involving the deformation tensors derived from the Jacobian matrices has been shown to improve statistical detection power.7 However, multivariate methods comprising large numbers of variables are computationally intensive and may be subject to noise. In addition, the anatomical interpretation of results is sometimes difficult. Here instead, we analyze the eigenvalues and the eigenvectors of the Jacobian matrices. Our method is validated on brain MRI data from Alzheimer's patients and healthy elderly controls from the Alzheimer's Disease Neuro Imaging Database.

  3. Multivariate Survival Mixed Models for Genetic Analysis of Longevity Traits

    DEFF Research Database (Denmark)

    Pimentel Maia, Rafael; Madsen, Per; Labouriau, Rodrigo

    2014-01-01

    A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented co...... applications. The methods presented are implemented in such a way that large and complex quantitative genetic data can be analyzed......A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented...... concentrates on longevity studies. The framework presented allows to combine models based on continuous time with models based on discrete time in a joint analysis. The continuous time models are approximations of the frailty model in which the hazard function will be assumed to be piece-wise constant...

  4. Multivariate Survival Mixed Models for Genetic Analysis of Longevity Traits

    DEFF Research Database (Denmark)

    Pimentel Maia, Rafael; Madsen, Per; Labouriau, Rodrigo

    2013-01-01

    A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented co...... applications. The methods presented are implemented in such a way that large and complex quantitative genetic data can be analyzed......A class of multivariate mixed survival models for continuous and discrete time with a complex covariance structure is introduced in a context of quantitative genetic applications. The methods introduced can be used in many applications in quantitative genetics although the discussion presented...... concentrates on longevity studies. The framework presented allows to combine models based on continuous time with models based on discrete time in a joint analysis. The continuous time models are approximations of the frailty model in which the hazard function will be assumed to be piece-wise constant...

  5. A robust multivariate long run analysis of European electricity prices

    OpenAIRE

    Bruno Bosco; Lucia Parisio; Matteo Pelagatti; Fabio Baldi

    2007-01-01

    This paper analyses the interdependencies existing in wholesale electricity prices in six major European countries. The results of our robust multivariate long run dynamic analysis reveal the presence of four highly integrated central European markets (France, Germany, the Netherlands and Austria). The trend shared by these four electricity markets appears to be common also to gas prices, but not to oil prices. The existence of long term dynamics among electricity prices and between electrici...

  6. Oxidative stability of frozen mackerel batches ― A multivariate data analysis approach

    DEFF Research Database (Denmark)

    Helbo Ekgreen, M.; Frosch, Stina; Baron, Caroline Pascale

    2011-01-01

    deterioration and texture changes. The aim was to investigate the correlation between the raw material history and the quality loss observed during frozen storage using relevant multivariate data analysis such as Principal Component Analysis (PCA) and Partial Least Square Analysis (PLS). Preliminary results...... showed that it was possible to differentiate between the different batches depending on their history and that some batches were more oxidised than others. Furthermore, based on the results from the data analysis, critical control points in the entire production chain will be identified and strategies...

  7. On generalized elliptical quantiles in the nonlinear quantile regression setup

    Czech Academy of Sciences Publication Activity Database

    Hlubinka, D.; Šiman, Miroslav

    2015-01-01

    Roč. 24, č. 2 (2015), s. 249-264 ISSN 1133-0686 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * elliptical quantile * quantile regression * multivariate statistical inference * portfolio optimization Subject RIV: BA - General Mathematics Impact factor: 1.207, year: 2015 http://library.utia.cas.cz/separaty/2014/SI/siman-0434510.pdf

  8. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    Science.gov (United States)

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  9. Authentication of Trappist beers by LC-MS fingerprints and multivariate data analysis.

    Science.gov (United States)

    Mattarucchi, Elia; Stocchero, Matteo; Moreno-Rojas, José Manuel; Giordano, Giuseppe; Reniero, Fabiano; Guillou, Claude

    2010-12-08

    The aim of this study was to asses the applicability of LC-MS profiling to authenticate a selected Trappist beer as part of a program on traceability funded by the European Commission. A total of 232 beers were fingerprinted and classified through multivariate data analysis. The selected beer was clearly distinguished from beers of different brands, while only 3 samples (3.5% of the test set) were wrongly classified when compared with other types of beer of the same Trappist brewery. The fingerprints were further analyzed to extract the most discriminating variables, which proved to be sufficient for classification, even using a simplified unsupervised model. This reduced fingerprint allowed us to study the influence of batch-to-batch variability on the classification model. Our results can easily be applied to different matrices and they confirmed the effectiveness of LC-MS profiling in combination with multivariate data analysis for the characterization of food products.

  10. Utilização de regressão multivariada para avaliação espectrofotométrica da demanda química de oxigênio em amostras de relevância ambiental Use of multivariate regression in spectrophotometric evaluation of chemical oxigen demand in samples of environmental relevance

    Directory of Open Access Journals (Sweden)

    Patricio Peralta-Zamora

    2005-10-01

    Full Text Available In this work, a partial least squares regression routine was used to develop a multivariate calibration model to predict the chemical oxygen demand (COD in substrates of environmental relevance (paper effluents and landfill leachates from UV-Vis spectral data. The calibration models permit the fast determination of the COD with typical relative errors lower by 10% with respect to the conventional methodology.

  11. Multivariable nonlinear analysis of foreign exchange rates

    Science.gov (United States)

    Suzuki, Tomoya; Ikeguchi, Tohru; Suzuki, Masuo

    2003-05-01

    We analyze the multivariable time series of foreign exchange rates. These are price movements that have often been analyzed, and dealing time intervals and spreads between bid and ask prices. Considering dealing time intervals as event timing such as neurons’ firings, we use raster plots (RPs) and peri-stimulus time histograms (PSTHs) which are popular methods in the field of neurophysiology. Introducing special processings to obtaining RPs and PSTHs time histograms for analyzing exchange rates time series, we discover that there exists dynamical interaction among three variables. We also find that adopting multivariables leads to improvements of prediction accuracy.

  12. Multivariate Formation Pressure Prediction with Seismic-derived Petrophysical Properties from Prestack AVO inversion and Poststack Seismic Motion Inversion

    Science.gov (United States)

    Yu, H.; Gu, H.

    2017-12-01

    A novel multivariate seismic formation pressure prediction methodology is presented, which incorporates high-resolution seismic velocity data from prestack AVO inversion, and petrophysical data (porosity and shale volume) derived from poststack seismic motion inversion. In contrast to traditional seismic formation prediction methods, the proposed methodology is based on a multivariate pressure prediction model and utilizes a trace-by-trace multivariate regression analysis on seismic-derived petrophysical properties to calibrate model parameters in order to make accurate predictions with higher resolution in both vertical and lateral directions. With prestack time migration velocity as initial velocity model, an AVO inversion was first applied to prestack dataset to obtain high-resolution seismic velocity with higher frequency that is to be used as the velocity input for seismic pressure prediction, and the density dataset to calculate accurate Overburden Pressure (OBP). Seismic Motion Inversion (SMI) is an inversion technique based on Markov Chain Monte Carlo simulation. Both structural variability and similarity of seismic waveform are used to incorporate well log data to characterize the variability of the property to be obtained. In this research, porosity and shale volume are first interpreted on well logs, and then combined with poststack seismic data using SMI to build porosity and shale volume datasets for seismic pressure prediction. A multivariate effective stress model is used to convert velocity, porosity and shale volume datasets to effective stress. After a thorough study of the regional stratigraphic and sedimentary characteristics, a regional normally compacted interval model is built, and then the coefficients in the multivariate prediction model are determined in a trace-by-trace multivariate regression analysis on the petrophysical data. The coefficients are used to convert velocity, porosity and shale volume datasets to effective stress and then

  13. A multivariate analysis of the energy intensity of sprawl versus compact living in the U.S. for 2003

    Energy Technology Data Exchange (ETDEWEB)

    Shammin, Md. R.; Herendeen, Robert A.; Hanson, Michelle J.; Wilson, Eric J.H.

    2010-10-15

    We explore the energy intensity of sprawl versus compact living by analyzing the total energy requirements of U.S. households for the year 2003. The methods used are based on previous studies on energy cost of living. Total energy requirement is calculated as a function of individual energy intensities of goods and services derived from economic input-output analysis and expenditures for those goods and services. We use multivariate regression analysis to estimate patterns in household energy intensities. We define sprawl in terms of location in rural areas or in areas with low population size. We find that even though sprawl-related factors account for about 83% of the average household energy consumption, sprawl is only 17-19% more energy intensive than compact living based on how people actually lived. We observe that some of the advantages of reduced direct energy use by people living in high density urban centers are offset by their consumption of other non-energy products. A more detailed analysis reveals that lifestyle choices (household type, number of vehicles, and family size) that could be independent of location play a significant role in determining household energy intensity. We develop two models that offer opportunities for further analysis. (author)

  14. Visual grading characteristics and ordinal regression analysis during optimisation of CT head examinations.

    Science.gov (United States)

    Zarb, Francis; McEntee, Mark F; Rainford, Louise

    2015-06-01

    To evaluate visual grading characteristics (VGC) and ordinal regression analysis during head CT optimisation as a potential alternative to visual grading assessment (VGA), traditionally employed to score anatomical visualisation. Patient images (n = 66) were obtained using current and optimised imaging protocols from two CT suites: a 16-slice scanner at the national Maltese centre for trauma and a 64-slice scanner in a private centre. Local resident radiologists (n = 6) performed VGA followed by VGC and ordinal regression analysis. VGC alone indicated that optimised protocols had similar image quality as current protocols. Ordinal logistic regression analysis provided an in-depth evaluation, criterion by criterion allowing the selective implementation of the protocols. The local radiology review panel supported the implementation of optimised protocols for brain CT examinations (including trauma) in one centre, achieving radiation dose reductions ranging from 24 % to 36 %. In the second centre a 29 % reduction in radiation dose was achieved for follow-up cases. The combined use of VGC and ordinal logistic regression analysis led to clinical decisions being taken on the implementation of the optimised protocols. This improved method of image quality analysis provided the evidence to support imaging protocol optimisation, resulting in significant radiation dose savings. • There is need for scientifically based image quality evaluation during CT optimisation. • VGC and ordinal regression analysis in combination led to better informed clinical decisions. • VGC and ordinal regression analysis led to dose reductions without compromising diagnostic efficacy.

  15. Comparison of cranial sex determination by discriminant analysis and logistic regression.

    Science.gov (United States)

    Amores-Ampuero, Anabel; Alemán, Inmaculada

    2016-04-05

    Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).

  16. Poisson regression analysis of the mortality among a cohort of World War II nuclear industry workers

    International Nuclear Information System (INIS)

    Frome, E.L.; Cragle, D.L.; McLain, R.W.

    1990-01-01

    A historical cohort mortality study was conducted among 28,008 white male employees who had worked for at least 1 month in Oak Ridge, Tennessee, during World War II. The workers were employed at two plants that were producing enriched uranium and a research and development laboratory. Vital status was ascertained through 1980 for 98.1% of the cohort members and death certificates were obtained for 96.8% of the 11,671 decedents. A modified version of the traditional standardized mortality ratio (SMR) analysis was used to compare the cause-specific mortality experience of the World War II workers with the U.S. white male population. An SMR and a trend statistic were computed for each cause-of-death category for the 30-year interval from 1950 to 1980. The SMR for all causes was 1.11, and there was a significant upward trend of 0.74% per year. The excess mortality was primarily due to lung cancer and diseases of the respiratory system. Poisson regression methods were used to evaluate the influence of duration of employment, facility of employment, socioeconomic status, birth year, period of follow-up, and radiation exposure on cause-specific mortality. Maximum likelihood estimates of the parameters in a main-effects model were obtained to describe the joint effects of these six factors on cause-specific mortality of the World War II workers. We show that these multivariate regression techniques provide a useful extension of conventional SMR analysis and illustrate their effective use in a large occupational cohort study

  17. Regression Analysis: Instructional Resource for Cost/Managerial Accounting

    Science.gov (United States)

    Stout, David E.

    2015-01-01

    This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…

  18. MANCOVA for one way classification with homogeneity of regression coefficient vectors

    Science.gov (United States)

    Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.

    2017-11-01

    The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.

  19. Spatial compression algorithm for the analysis of very large multivariate images

    Science.gov (United States)

    Keenan, Michael R [Albuquerque, NM

    2008-07-15

    A method for spatially compressing data sets enables the efficient analysis of very large multivariate images. The spatial compression algorithms use a wavelet transformation to map an image into a compressed image containing a smaller number of pixels that retain the original image's information content. Image analysis can then be performed on a compressed data matrix consisting of a reduced number of significant wavelet coefficients. Furthermore, a block algorithm can be used for performing common operations more efficiently. The spatial compression algorithms can be combined with spectral compression algorithms to provide further computational efficiencies.

  20. Near and mid infrared spectroscopy and multivariate data analysis in studies of oxidation of edible oils.

    Science.gov (United States)

    Wójcicki, Krzysztof; Khmelinskii, Igor; Sikorski, Marek; Sikorska, Ewa

    2015-11-15

    Infrared spectroscopic techniques and chemometric methods were used to study oxidation of olive, sunflower and rapeseed oils. Accelerated oxidative degradation of oils at 60°C was monitored using peroxide values and FT-MIR ATR and FT-NIR transmittance spectroscopy. Principal component analysis (PCA) facilitated visualization and interpretation of spectral changes occurring during oxidation. Multivariate curve resolution (MCR) method found three spectral components in the NIR and MIR spectral matrix, corresponding to the oxidation products, and saturated and unsaturated structures. Good quantitative relation was found between peroxide value and contribution of oxidation products evaluated using MCR--based on NIR (R(2) = 0.890), MIR (R(2) = 0.707) and combined NIR and MIR (R(2) = 0.747) data. Calibration models for prediction peroxide value established using partial least squares (PLS) regression were characterized for MIR (R(2) = 0.701, RPD = 1.7), NIR (R(2) = 0.970, RPD = 5.3), and combined NIR and MIR data (R(2) = 0.954, RPD = 3.1). Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Robust Mediation Analysis Based on Median Regression

    Science.gov (United States)

    Yuan, Ying; MacKinnon, David P.

    2014-01-01

    Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925

  2. Multivariate stochastic analysis for Monthly hydrological time series at Cuyahoga River Basin

    Science.gov (United States)

    zhang, L.

    2011-12-01

    Copula has become a very powerful statistic and stochastic methodology in case of the multivariate analysis in Environmental and Water resources Engineering. In recent years, the popular one-parameter Archimedean copulas, e.g. Gumbel-Houggard copula, Cook-Johnson copula, Frank copula, the meta-elliptical copula, e.g. Gaussian Copula, Student-T copula, etc. have been applied in multivariate hydrological analyses, e.g. multivariate rainfall (rainfall intensity, duration and depth), flood (peak discharge, duration and volume), and drought analyses (drought length, mean and minimum SPI values, and drought mean areal extent). Copula has also been applied in the flood frequency analysis at the confluences of river systems by taking into account the dependence among upstream gauge stations rather than by using the hydrological routing technique. In most of the studies above, the annual time series have been considered as stationary signal which the time series have been assumed as independent identically distributed (i.i.d.) random variables. But in reality, hydrological time series, especially the daily and monthly hydrological time series, cannot be considered as i.i.d. random variables due to the periodicity existed in the data structure. Also, the stationary assumption is also under question due to the Climate Change and Land Use and Land Cover (LULC) change in the fast years. To this end, it is necessary to revaluate the classic approach for the study of hydrological time series by relaxing the stationary assumption by the use of nonstationary approach. Also as to the study of the dependence structure for the hydrological time series, the assumption of same type of univariate distribution also needs to be relaxed by adopting the copula theory. In this paper, the univariate monthly hydrological time series will be studied through the nonstationary time series analysis approach. The dependence structure of the multivariate monthly hydrological time series will be

  3. Comparison of Prediction Model for Cardiovascular Autonomic Dysfunction Using Artificial Neural Network and Logistic Regression Analysis

    Science.gov (United States)

    Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo

    2013-01-01

    Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593

  4. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    Science.gov (United States)

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  5. Multivariate statistical analysis for x-ray photoelectron spectroscopy spectral imaging: Effect of image acquisition time

    International Nuclear Information System (INIS)

    Peebles, D.E.; Ohlhausen, J.A.; Kotula, P.G.; Hutton, S.; Blomfield, C.

    2004-01-01

    The acquisition of spectral images for x-ray photoelectron spectroscopy (XPS) is a relatively new approach, although it has been used with other analytical spectroscopy tools for some time. This technique provides full spectral information at every pixel of an image, in order to provide a complete chemical mapping of the imaged surface area. Multivariate statistical analysis techniques applied to the spectral image data allow the determination of chemical component species, and their distribution and concentrations, with minimal data acquisition and processing times. Some of these statistical techniques have proven to be very robust and efficient methods for deriving physically realistic chemical components without input by the user other than the spectral matrix itself. The benefits of multivariate analysis of the spectral image data include significantly improved signal to noise, improved image contrast and intensity uniformity, and improved spatial resolution - which are achieved due to the effective statistical aggregation of the large number of often noisy data points in the image. This work demonstrates the improvements in chemical component determination and contrast, signal-to-noise level, and spatial resolution that can be obtained by the application of multivariate statistical analysis to XPS spectral images

  6. Multivariate statistical modelling based on generalized linear models

    CERN Document Server

    Fahrmeir, Ludwig

    1994-01-01

    This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...

  7. A comparative study of multiple regression analysis and back ...

    Indian Academy of Sciences (India)

    Abhijit Sarkar

    artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arc welding ... Keywords. Submerged arc welding (SAW); multi-regression analysis (MRA); artificial neural network ..... Degree of freedom.

  8. Multivariate Analysis Techniques for Optimal Vision System Design

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara

    The present thesis considers optimization of the spectral vision systems used for quality inspection of food items. The relationship between food quality, vision based techniques and spectral signature are described. The vision instruments for food analysis as well as datasets of the food items...... used in this thesis are described. The methodological strategies are outlined including sparse regression and pre-processing based on feature selection and extraction methods, supervised versus unsupervised analysis and linear versus non-linear approaches. One supervised feature selection algorithm...... (SSPCA) and DCT based characterization of the spectral diffused reflectance images for wavelength selection and discrimination. These methods together with some other state-of-the-art statistical and mathematical analysis techniques are applied on datasets of different food items; meat, diaries, fruits...

  9. A primer of multivariate statistics

    CERN Document Server

    Harris, Richard J

    2014-01-01

    Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why

  10. Multivariate Bias Correction Procedures for Improving Water Quality Predictions from the SWAT Model

    Science.gov (United States)

    Arumugam, S.; Libera, D.

    2017-12-01

    Water quality observations are usually not available on a continuous basis for longer than 1-2 years at a time over a decadal period given the labor requirements making calibrating and validating mechanistic models difficult. Further, any physical model predictions inherently have bias (i.e., under/over estimation) and require post-simulation techniques to preserve the long-term mean monthly attributes. This study suggests a multivariate bias-correction technique and compares to a common technique in improving the performance of the SWAT model in predicting daily streamflow and TN loads across the southeast based on split-sample validation. The approach is a dimension reduction technique, canonical correlation analysis (CCA) that regresses the observed multivariate attributes with the SWAT model simulated values. The common approach is a regression based technique that uses an ordinary least squares regression to adjust model values. The observed cross-correlation between loadings and streamflow is better preserved when using canonical correlation while simultaneously reducing individual biases. Additionally, canonical correlation analysis does a better job in preserving the observed joint likelihood of observed streamflow and loadings. These procedures were applied to 3 watersheds chosen from the Water Quality Network in the Southeast Region; specifically, watersheds with sufficiently large drainage areas and number of observed data points. The performance of these two approaches are compared for the observed period and over a multi-decadal period using loading estimates from the USGS LOADEST model. Lastly, the CCA technique is applied in a forecasting sense by using 1-month ahead forecasts of P & T from ECHAM4.5 as forcings in the SWAT model. Skill in using the SWAT model for forecasting loadings and streamflow at the monthly and seasonal timescale is also discussed.

  11. Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

    Science.gov (United States)

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

    2003-01-01

    Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity

  12. Multivariate analysis of the cleaning efficacy of different final irrigation techniques in the canal and isthmus of mandibular posterior teeth

    Directory of Open Access Journals (Sweden)

    Yeon-Jee Yoo

    2013-08-01

    Full Text Available Objectives The aim of this study was to compare the cleaning efficacy of different final irrigation regimens in canal and isthmus of mandibular molars, and to evaluate the influence of related variables on cleaning efficacy of the irrigation systems. Materials and Methods Mesial root canals from 60 mandibular molars were prepared and divided into 4 experimental groups according to the final irrigation technique: Group C, syringe irrigation; Group U, ultrasonics activation; Group SC, VPro StreamClean irrigation; Group EV, EndoVac irrigation. Cross-sections at 1, 3 and 5 mm levels from the apex were examined to calculate remaining debris area in the canal and isthmus spaces. Statistical analysis was completed by using Kruskal-Wallis test and Mann-Whitney U test for comparison among groups, and multivariate linear analysis to identify the significant variables (regular replenishment of irrigant, vapor lock management, and ultrasonic activation of irrigant affecting the cleaning efficacy of the experimental groups. Results Group SC and EV showed significantly higher canal cleanliness values than group C and U at 1 mm level (p < 0.05, and higher isthmus cleanliness values than group U at 3 mm and all levels of group C (p < 0.05. Multivariate linear regression analysis demonstrated that all variables had independent positive correlation at 1 mm level of canal and at all levels of isthmus with statistical significances. Conclusions Both VPro StreamClean and EndoVac system showed favorable result as final irrigation regimens for cleaning debris in the complicated root canal system having curved canal and/or isthmus. The debridement of the isthmi significantly depends on the variables rather than the canals.

  13. Thawed chilled Barents Sea cod fillets in modified atmosphere packaging-application of multivariate data analysis to select key parameters in good manufacturing practice

    DEFF Research Database (Denmark)

    Bøknæs, Niels; Jensen, K.N.; Guldager, H.S.

    2002-01-01

    The purpose of the present study was to select key parameters in good manufacturing practice for production of thawed chilled modified atmosphere packed (MAP) cod (Gadus morhua) fillets. The effect of frozen storage temperature (-20 and -30 C), frozen storage period (3, 6, 9 and 12 mo) and chill...... storage periods up to 21 d at 2 C were evaluated for thawed MAP Barents Sea cod fillets. Sensory, chemical, microbiological and physical quality attributes were evaluated and multivariate data analysis (principal component analysis and partial least- squares regression) applied for identification of key...... storage was low for thawed MAP Barents Sea cod and this fish raw material seemed the more appropriate for production of thawed chilled MAP products. Frozen storage inactivation of the spoilage bacteria of Photobacterium phosphorcum was modest in Barnets Sea cod, possibly due to high trimethylamine oxide...

  14. Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

    Science.gov (United States)

    Hu, Yi-Chung

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.

  15. An overview of multivariate gamma distributions as seen from a (multivariate) matrix exponential perspective

    DEFF Research Database (Denmark)

    Bladt, Mogens; Nielsen, Bo Friis

    2012-01-01

    Laplace transform. In a longer perspective stochastic and statistical analysis for MVME will in particular apply to any of the previously defined distributions. Multivariate gamma distributions have been used in a variety of fields like hydrology, [11], [10], [6], space (wind modeling) [9] reliability [3......Numerous definitions of multivariate exponential and gamma distributions can be retrieved from the literature [4]. These distribtuions belong to the class of Multivariate Matrix-- Exponetial Distributions (MVME) whenever their joint Laplace transform is a rational function. The majority...... of these distributions further belongs to an important subclass of MVME distributions [5, 1] where the multivariate random vector can be interpreted as a number of simultaneously collected rewards during sojourns in a the states of a Markov chain with one absorbing state, the rest of the states being transient. We...

  16. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Science.gov (United States)

    Braun, Michael T; Oswald, Frederick L

    2011-06-01

    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.

  17. Multivariate statistical analysis of wildfires in Portugal

    Science.gov (United States)

    Costa, Ricardo; Caramelo, Liliana; Pereira, Mário

    2013-04-01

    Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).

  18. Distributed Monitoring of the R2 Statistic for Linear Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...

  19. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

    Science.gov (United States)

    Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-01

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  20. Evaluation of syngas production unit cost of bio-gasification facility using regression analysis techniques

    Energy Technology Data Exchange (ETDEWEB)

    Deng, Yangyang; Parajuli, Prem B.

    2011-08-10

    Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.

  1. Structural analysis and design of multivariable control systems: An algebraic approach

    Science.gov (United States)

    Tsay, Yih Tsong; Shieh, Leang-San; Barnett, Stephen

    1988-01-01

    The application of algebraic system theory to the design of controllers for multivariable (MV) systems is explored analytically using an approach based on state-space representations and matrix-fraction descriptions. Chapters are devoted to characteristic lambda matrices and canonical descriptions of MIMO systems; spectral analysis, divisors, and spectral factors of nonsingular lambda matrices; feedback control of MV systems; and structural decomposition theories and their application to MV control systems.

  2. Characterization of Land Transitions Patterns from Multivariate Time Series Using Seasonal Trend Analysis and Principal Component Analysis

    Directory of Open Access Journals (Sweden)

    Benoit Parmentier

    2014-12-01

    Full Text Available Characterizing biophysical changes in land change areas over large regions with short and noisy multivariate time series and multiple temporal parameters remains a challenging task. Most studies focus on detection rather than the characterization, i.e., the manner by which surface state variables are altered by the process of changes. In this study, a procedure is presented to extract and characterize simultaneous temporal changes in MODIS multivariate times series from three surface state variables the Normalized Difference Vegetation Index (NDVI, land surface temperature (LST and albedo (ALB. The analysis involves conducting a seasonal trend analysis (STA to extract three seasonal shape parameters (Amplitude 0, Amplitude 1 and Amplitude 2 and using principal component analysis (PCA to contrast trends in change and no-change areas. We illustrate the method by characterizing trends in burned and unburned pixels in Alaska over the 2001–2009 time period. Findings show consistent and meaningful extraction of temporal patterns related to fire disturbances. The first principal component (PC1 is characterized by a decrease in mean NDVI (Amplitude 0 with a concurrent increase in albedo (the mean and the annual amplitude and an increase in LST annual variability (Amplitude 1. These results provide systematic empirical evidence of surface changes associated with one type of land change, fire disturbances, and suggest that STA with PCA may be used to characterize many other types of land transitions over large landscape areas using multivariate Earth observation time series.

  3. Utilização de estratificação e modelo de regressão logística na análise de dados de estudos caso-controle Using of stratification and the logistic regression model in the analysis of data of case-control studies

    Directory of Open Access Journals (Sweden)

    Suely Godoy Agostinho Gimeno

    1995-08-01

    Full Text Available Exemplifica-se a aplicação de análise multivariada, por estratificação e com regressão logística, utilizando dados de um estudo caso-controle sobre câncer de esôfago. Oitenta e cinco casos e 292 controles foram classificados segundo sexo, idade e os hábitos de beber e de fumar. As estimativas por ponto dos odds ratios foram semelhantes, sendo as duas técnicas consideradas complementares.Data of a case-control study of esophageal cancer were used as an example of the use of multivariate analysis with stratification and logistic regression. Eighty-five cases and 292 controls were classified according to sex, age and smoking and drinking habits. The point estimates of the odds ratios were similar, and the techniques were considered complementary.

  4. Adaptive metric kernel regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    2000-01-01

    Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  5. Spatially resolved regression analysis of pre-treatment FDG, FLT and Cu-ATSM PET from post-treatment FDG PET: an exploratory study

    Science.gov (United States)

    Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert

    2012-01-01

    Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, pregression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748

  6. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  7. Multivariate analysis of prognostic factors for idiopathic sudden sensorineural hearing loss in children.

    Science.gov (United States)

    Chung, Jae Ho; Cho, Seok Hyun; Jeong, Jin Hyeok; Park, Chul Won; Lee, Seung Hwan

    2015-09-01

    To evaluate clinical characteristics and possible associated factors of idiopathic sudden sensorineural hearing loss (ISSNHL) in children using univariate and multivariate analyses. A retrospective case series with comparisons. From January 2007 to December 2013, medical records of 37 pediatric ISSNHL patients were reviewed to assess hearing recovery rate and examine factors associated with prognosis (gender; side of hearing loss; opposite side hearing loss; treatment onset; presence of vertigo, tinnitus, and ear fullness; initial hearing threshold), using univariate and multivariate analysis, and compare them with 276 adult ISSNHL patients. Pediatric patients comprised only 6.6% of pediatric/adult cases of ISSNHL, and those below 10 years old were only 0.7%. The overall recovery rates (complete and partial) of the pediatric and adult patients were 57.4% and 47.2%, respectively. The complete recovery rate of the pediatric group (46.6%) was higher than that of the adult group (30.8%, P = .040). According to multivariate analysis, absence of tinnitus, later onset of treatment, and higher hearing threshold at initial presentation were associated with a poor prognosis in pediatric ISSNHL. The recovery rate of ISSNHL in pediatric patients is higher than in adults, and the presence of tinnitus and earlier treatment onset is associated with favorable outcomes. 4. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.

  8. External Tank Liquid Hydrogen (LH2) Prepress Regression Analysis Independent Review Technical Consultation Report

    Science.gov (United States)

    Parsons, Vickie s.

    2009-01-01

    The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.

  9. Multivariate calibration in Laser-Induced Breakdown Spectroscopy quantitative analysis: The dangers of a 'black box' approach and how to avoid them

    Science.gov (United States)

    Safi, A.; Campanella, B.; Grifoni, E.; Legnaioli, S.; Lorenzetti, G.; Pagnotta, S.; Poggialini, F.; Ripoll-Seguer, L.; Hidalgo, M.; Palleschi, V.

    2018-06-01

    The introduction of multivariate calibration curve approach in Laser-Induced Breakdown Spectroscopy (LIBS) quantitative analysis has led to a general improvement of the LIBS analytical performances, since a multivariate approach allows to exploit the redundancy of elemental information that are typically present in a LIBS spectrum. Software packages implementing multivariate methods are available in the most diffused commercial and open source analytical programs; in most of the cases, the multivariate algorithms are robust against noise and operate in unsupervised mode. The reverse of the coin of the availability and ease of use of such packages is the (perceived) difficulty in assessing the reliability of the results obtained which often leads to the consideration of the multivariate algorithms as 'black boxes' whose inner mechanism is supposed to remain hidden to the user. In this paper, we will discuss the dangers of a 'black box' approach in LIBS multivariate analysis, and will discuss how to overcome them using the chemical-physical knowledge that is at the base of any LIBS quantitative analysis.

  10. Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.

    Science.gov (United States)

    Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard

    2017-04-01

    To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.

  11. Power analysis for multivariate and repeated measures designs: a flexible approach using the SPSS MANOVA procedure.

    Science.gov (United States)

    D'Amico, E J; Neilands, T B; Zambarano, R

    2001-11-01

    Although power analysis is an important component in the planning and implementation of research designs, it is often ignored. Computer programs for performing power analysis are available, but most have limitations, particularly for complex multivariate designs. An SPSS procedure is presented that can be used for calculating power for univariate, multivariate, and repeated measures models with and without time-varying and time-constant covariates. Three examples provide a framework for calculating power via this method: an ANCOVA, a MANOVA, and a repeated measures ANOVA with two or more groups. The benefits and limitations of this procedure are discussed.

  12. Non-stationary hydrologic frequency analysis using B-spline quantile regression

    Science.gov (United States)

    Nasri, B.; Bouezmarni, T.; St-Hilaire, A.; Ouarda, T. B. M. J.

    2017-11-01

    Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic and water resources systems under the assumption of stationarity. However, with increasing evidence of climate change, it is possible that the assumption of stationarity, which is prerequisite for traditional frequency analysis and hence, the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extremes based on B-Spline quantile regression which allows to model data in the presence of non-stationarity and/or dependence on covariates with linear and non-linear dependence. A Markov Chain Monte Carlo (MCMC) algorithm was used to estimate quantiles and their posterior distributions. A coefficient of determination and Bayesian information criterion (BIC) for quantile regression are used in order to select the best model, i.e. for each quantile, we choose the degree and number of knots of the adequate B-spline quantile regression model. The method is applied to annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in the variable of interest and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for an annual maximum and minimum discharge with high annual non-exceedance probabilities.

  13. Batch-to-batch quality consistency evaluation of botanical drug products using multivariate statistical analysis of the chromatographic fingerprint.

    Science.gov (United States)

    Xiong, Haoshu; Yu, Lawrence X; Qu, Haibin

    2013-06-01

    Botanical drug products have batch-to-batch quality variability due to botanical raw materials and the current manufacturing process. The rational evaluation and control of product quality consistency are essential to ensure the efficacy and safety. Chromatographic fingerprinting is an important and widely used tool to characterize the chemical composition of botanical drug products. Multivariate statistical analysis has showed its efficacy and applicability in the quality evaluation of many kinds of industrial products. In this paper, the combined use of multivariate statistical analysis and chromatographic fingerprinting is presented here to evaluate batch-to-batch quality consistency of botanical drug products. A typical botanical drug product in China, Shenmai injection, was selected as the example to demonstrate the feasibility of this approach. The high-performance liquid chromatographic fingerprint data of historical batches were collected from a traditional Chinese medicine manufacturing factory. Characteristic peaks were weighted by their variability among production batches. A principal component analysis model was established after outliers were modified or removed. Multivariate (Hotelling T(2) and DModX) control charts were finally successfully applied to evaluate the quality consistency. The results suggest useful applications for a combination of multivariate statistical analysis with chromatographic fingerprinting in batch-to-batch quality consistency evaluation for the manufacture of botanical drug products.

  14. What makes a pattern? Matching decoding methods to data in multivariate pattern analysis

    Directory of Open Access Journals (Sweden)

    Philip A Kragel

    2012-11-01

    Full Text Available Research in neuroscience faces the challenge of integrating information across different spatial scales of brain function. A promising technique for harnessing information at a range of spatial scales is multivariate pattern analysis (MVPA of functional magnetic resonance imaging (fMRI data. While the prevalence of MVPA has increased dramatically in recent years, its typical implementations for classification of mental states utilize only a subset of the information encoded in local fMRI signals. We review published studies employing multivariate pattern classification since the technique’s introduction, which reveal an extensive focus on the improved detection power that linear classifiers provide over traditional analysis techniques. We demonstrate using simulations and a searchlight approach, however, that nonlinear classifiers are capable of extracting distinct information about interactions within a local region. We conclude that for spatially localized analyses, such as searchlight and region of interest, multiple classification approaches should be compared in order to match fMRI analyses to the properties of local circuits.

  15. Alternative Methods of Regression

    CERN Document Server

    Birkes, David

    2011-01-01

    Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s

  16. Multivariate Regression Analysis and Statistical Modeling for Summer Extreme Precipitation over the Yangtze River Basin, China

    Directory of Open Access Journals (Sweden)

    Tao Gao

    2014-01-01

    Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.

  17. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

    Science.gov (United States)

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-12-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis

  18. Early prediction of wheat quality: analysis during grain development using mass spectrometry and multivariate data analysis

    DEFF Research Database (Denmark)

    Ghirardo, A.; Sørensen, Helle Aagaard; Petersen, M.

    2005-01-01

    Matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry and multivariate data analysis have been used for the determination of wheat quality at different stages of grain development. Wheat varieties with one of two different end-use qualities (i.e. suitable or not suitable...... data analysis, offers a method that can replace the traditional rather time-consuming ones such as gel electrophoresis. This study focused on the determination of wheat quality at 15 dpa, when the grain is due for harvest 1 month later....

  19. PyMVPA: A python toolbox for multivariate pattern analysis of fMRI data.

    Science.gov (United States)

    Hanke, Michael; Halchenko, Yaroslav O; Sederberg, Per B; Hanson, Stephen José; Haxby, James V; Pollmann, Stefan

    2009-01-01

    Decoding patterns of neural activity onto cognitive states is one of the central goals of functional brain imaging. Standard univariate fMRI analysis methods, which correlate cognitive and perceptual function with the blood oxygenation-level dependent (BOLD) signal, have proven successful in identifying anatomical regions based on signal increases during cognitive and perceptual tasks. Recently, researchers have begun to explore new multivariate techniques that have proven to be more flexible, more reliable, and more sensitive than standard univariate analysis. Drawing on the field of statistical learning theory, these new classifier-based analysis techniques possess explanatory power that could provide new insights into the functional properties of the brain. However, unlike the wealth of software packages for univariate analyses, there are few packages that facilitate multivariate pattern classification analyses of fMRI data. Here we introduce a Python-based, cross-platform, and open-source software toolbox, called PyMVPA, for the application of classifier-based analysis techniques to fMRI datasets. PyMVPA makes use of Python's ability to access libraries written in a large variety of programming languages and computing environments to interface with the wealth of existing machine learning packages. We present the framework in this paper and provide illustrative examples on its usage, features, and programmability.

  20. Multivariate missing data in hydrology - Review and applications

    Science.gov (United States)

    Ben Aissia, Mohamed-Aymen; Chebana, Fateh; Ouarda, Taha B. M. J.

    2017-12-01

    Water resources planning and management require complete data sets of a number of hydrological variables, such as flood peaks and volumes. However, hydrologists are often faced with the problem of missing data (MD) in hydrological databases. Several methods are used to deal with the imputation of MD. During the last decade, multivariate approaches have gained popularity in the field of hydrology, especially in hydrological frequency analysis (HFA). However, treating the MD remains neglected in the multivariate HFA literature whereas the focus has been mainly on the modeling component. For a complete analysis and in order to optimize the use of data, MD should also be treated in the multivariate setting prior to modeling and inference. Imputation of MD in the multivariate hydrological framework can have direct implications on the quality of the estimation. Indeed, the dependence between the series represents important additional information that can be included in the imputation process. The objective of the present paper is to highlight the importance of treating MD in multivariate hydrological frequency analysis by reviewing and applying multivariate imputation methods and by comparing univariate and multivariate imputation methods. An application is carried out for multiple flood attributes on three sites in order to evaluate the performance of the different methods based on the leave-one-out procedure. The results indicate that, the performance of imputation methods can be improved by adopting the multivariate setting, compared to mean substitution and interpolation methods, especially when using the copula-based approach.

  1. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

    Science.gov (United States)

    Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

    2015-11-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.

  2. On macroeconomic values investigation using fuzzy linear regression analysis

    Directory of Open Access Journals (Sweden)

    Richard Pospíšil

    2017-06-01

    Full Text Available The theoretical background for abstract formalization of the vague phenomenon of complex systems is the fuzzy set theory. In the paper, vague data is defined as specialized fuzzy sets - fuzzy numbers and there is described a fuzzy linear regression model as a fuzzy function with fuzzy numbers as vague parameters. To identify the fuzzy coefficients of the model, the genetic algorithm is used. The linear approximation of the vague function together with its possibility area is analytically and graphically expressed. A suitable application is performed in the tasks of the time series fuzzy regression analysis. The time-trend and seasonal cycles including their possibility areas are calculated and expressed. The examples are presented from the economy field, namely the time-development of unemployment, agricultural production and construction respectively between 2009 and 2011 in the Czech Republic. The results are shown in the form of the fuzzy regression models of variables of time series. For the period 2009-2011, the analysis assumptions about seasonal behaviour of variables and the relationship between them were confirmed; in 2010, the system behaved fuzzier and the relationships between the variables were vaguer, that has a lot of causes, from the different elasticity of demand, through state interventions to globalization and transnational impacts.

  3. Adaptive Metric Kernel Regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    1998-01-01

    Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  4. Directional quantile regression in Octave (and MATLAB)

    Czech Academy of Sciences Publication Activity Database

    Boček, Pavel; Šiman, Miroslav

    2016-01-01

    Roč. 52, č. 1 (2016), s. 28-51 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * multivariate quantile * depth contour * Matlab Subject RIV: IN - Informatics, Computer Science Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/bocek-0458380.pdf

  5. Reduction of the dimensionality and comparative analysis of multivariate radiological data

    International Nuclear Information System (INIS)

    Seddeek, M.K.; Kozae, A.M.; Sharshar, T.; Badran, H.M.

    2009-01-01

    Computational methods were used to reduce the dimensionality and to find clusters of multivariate data. The variables were the natural radioactivity contents and the texture characteristics of sand samples. The application of discriminate analysis revealed that samples with high negative values of the former score have the highest contamination with black sand. Principal component analysis (PCA) revealed that radioactivity concentrations alone are sufficient for the classification. Rough set analysis (RSA) showed that the concentration of 238 U, 226 Ra or 232 Th, combined with the concentration of 40 K, can specify the clusters and characteristics of the sand. Both PCA and RSA show that 238 U, 226 Ra and 232 Th behave similarly. RSA revealed that one or two of them can be omitted without degrading predictions.

  6. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: A case of the Belice River basin (western Sicily, Italy)

    Science.gov (United States)

    Conoscenti, Christian; Ciaccio, Marilena; Caraballo-Arias, Nathalie Almaru; Gómez-Gutiérrez, Álvaro; Rotigliano, Edoardo; Agnesi, Valerio

    2015-08-01

    In this paper, terrain susceptibility to earth-flow occurrence was evaluated by using geographic information systems (GIS) and two statistical methods: Logistic regression (LR) and multivariate adaptive regression splines (MARS). LR has been already demonstrated to provide reliable predictions of earth-flow occurrence, whereas MARS, as far as we know, has never been used to generate earth-flow susceptibility models. The experiment was carried out in a basin of western Sicily (Italy), which extends for 51 km2 and is severely affected by earth-flows. In total, we mapped 1376 earth-flows, covering an area of 4.59 km2. To explore the effect of pre-failure topography on earth-flow spatial distribution, we performed a reconstruction of topography before the landslide occurrence. This was achieved by preparing a digital terrain model (DTM) where altitude of areas hosting landslides was interpolated from the adjacent undisturbed land surface by using the algorithm topo-to-raster. This DTM was exploited to extract 15 morphological and hydrological variables that, in addition to outcropping lithology, were employed as explanatory variables of earth-flow spatial distribution. The predictive skill of the earth-flow susceptibility models and the robustness of the procedure were tested by preparing five datasets, each including a different subset of landslides and stable areas. The accuracy of the predictive models was evaluated by drawing receiver operating characteristic (ROC) curves and by calculating the area under the ROC curve (AUC). The results demonstrate that the overall accuracy of LR and MARS earth-flow susceptibility models is from excellent to outstanding. However, AUC values of the validation datasets attest to a higher predictive power of MARS-models (AUC between 0.881 and 0.912) with respect to LR-models (AUC between 0.823 and 0.870). The adopted procedure proved to be resistant to overfitting and stable when changes of the learning and validation samples are

  7. REGRESSION ANALYSIS OF SEA-SURFACE-TEMPERATURE PATTERNS FOR THE NORTH PACIFIC OCEAN.

    Science.gov (United States)

    SEA WATER, *SURFACE TEMPERATURE, *OCEANOGRAPHIC DATA, PACIFIC OCEAN, REGRESSION ANALYSIS , STATISTICAL ANALYSIS, UNDERWATER EQUIPMENT, DETECTION, UNDERWATER COMMUNICATIONS, DISTRIBUTION, THERMAL PROPERTIES, COMPUTERS.

  8. On Bayesian shared component disease mapping and ecological regression with errors in covariates.

    Science.gov (United States)

    MacNab, Ying C

    2010-05-20

    Recent literature on Bayesian disease mapping presents shared component models (SCMs) for joint spatial modeling of two or more diseases with common risk factors. In this study, Bayesian hierarchical formulations of shared component disease mapping and ecological models are explored and developed in the context of ecological regression, taking into consideration errors in covariates. A review of multivariate disease mapping models (MultiVMs) such as the multivariate conditional autoregressive models that are also part of the more recent Bayesian disease mapping literature is presented. Some insights into the connections and distinctions between the SCM and MultiVM procedures are communicated. Important issues surrounding (appropriate) formulation of shared- and disease-specific components, consideration/choice of spatial or non-spatial random effects priors, and identification of model parameters in SCMs are explored and discussed in the context of spatial and ecological analysis of small area multivariate disease or health outcome rates and associated ecological risk factors. The methods are illustrated through an in-depth analysis of four-variate road traffic accident injury (RTAI) data: gender-specific fatal and non-fatal RTAI rates in 84 local health areas in British Columbia (Canada). Fully Bayesian inference via Markov chain Monte Carlo simulations is presented. Copyright 2010 John Wiley & Sons, Ltd.

  9. Multivariate factor analysis of Girgentana goat milk composition

    Directory of Open Access Journals (Sweden)

    Pietro Giaccone

    2010-01-01

    Full Text Available The interpretation of the several variables that contribute to defining milk quality is difficult due to the high degree of  correlation among them. In this case, one of the best methods of statistical processing is factor analysis, which belongs  to the multivariate groups; for our study this particular statistical approach was employed.  A total of 1485 individual goat milk samples from 117 Girgentana goats, were collected fortnightly from January to July,  and analysed for physical and chemical composition, and clotting properties. Milk pH and tritable acidity were within the  normal range for fresh goat milk. Morning milk yield resulted 704 ± 323 g with 3.93 ± 1.23% and 3.48±0.38% for fat  and protein percentages, respectively. The milk urea content was 43.70 ± 8.28 mg/dl. The clotting ability of Girgentana  milk was quite good, with a renneting time equal to 16.96 ± 3.08 minutes, a rate of curd formation of 2.01 ± 1.63 min-  utes and a curd firmness of 25.08 ± 7.67 millimetres.  Factor analysis was performed by applying axis orthogonal rotation (rotation type VARIMAX; the analysis grouped the  milk components into three latent or common factors. The first, which explained 51.2% of the total covariance, was  defined as “slow milks”, because it was linked to r and pH. The second latent factor, which explained 36.2% of the total  covariance, was defined as “milk yield”, because it is positively correlated to the morning milk yield and to the urea con-  tent, whilst negatively correlated to the fat percentage. The third latent factor, which explained 12.6% of the total covari-  ance, was defined as “curd firmness,” because it is linked to protein percentage, a30 and titatrable acidity. With the aim  of evaluating the influence of environmental effects (stage of kidding, parity and type of kidding, factor scores were anal-  ysed with the mixed linear model. Results showed significant effects of the season of

  10. Regression analysis understanding and building business and economic models using Excel

    CERN Document Server

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  11. A multivariate analysis of Antarctic sea ice since 1979

    Energy Technology Data Exchange (ETDEWEB)

    Magalhaes Neto, Newton de; Evangelista, Heitor [Universidade do Estado do Rio de Janeiro (Uerj), LARAMG - Laboratorio de Radioecologia e Mudancas Globais, Maracana, Rio de Janeiro, RJ (Brazil); Tanizaki-Fonseca, Kenny [Universidade do Estado do Rio de Janeiro (Uerj), LARAMG - Laboratorio de Radioecologia e Mudancas Globais, Maracana, Rio de Janeiro, RJ (Brazil); Universidade Federal Fluminense (UFF), Dept. Analise Geoambiental, Inst. de Geociencias, Niteroi, RJ (Brazil); Penello Meirelles, Margareth Simoes [Universidade do Estado do Rio de Janeiro (UERJ)/Geomatica, Maracana, Rio de Janeiro, RJ (Brazil); Garcia, Carlos Eiras [Universidade Federal do Rio Grande (FURG), Laboratorio de Oceanografia Fisica, Rio Grande, RS (Brazil)

    2012-03-15

    Recent satellite observations have shown an increase in the total extent of Antarctic sea ice, during periods when the atmosphere and oceans tend to be warmer surrounding a significant part of the continent. Despite an increase in total sea ice, regional analyses depict negative trends in the Bellingshausen-Amundsen Sea and positive trends in the Ross Sea. Although several climate parameters are believed to drive the formation of Antarctic sea ice and the local atmosphere, a descriptive mechanism that could trigger such differences in trends are still unknown. In this study we employed a multivariate analysis in order to identify the response of the Antarctic sea ice with respect to commonly utilized climate forcings/parameters, as follows: (1) The global air surface temperature, (2) The global sea surface temperature, (3) The atmospheric CO{sub 2} concentration, (4) The South Annular Mode, (5) The Nino 3, (6) The Nino (3 + 4, 7) The Nino 4, (8) The Southern Oscillation Index, (9) The Multivariate ENSO Index, (10) the Total Solar Irradiance, (11) The maximum O{sub 3} depletion area, and (12) The minimum O{sub 3} concentration over Antarctica. Our results indicate that western Antarctic sea ice is simultaneously impacted by several parameters; and that the minimum, mean, and maximum sea ice extent may respond to a separate set of climatic/geochemical parameters. (orig.)

  12. Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

    Science.gov (United States)

    Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

    2016-08-01

    The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data

  13. Nonlinear regression analysis for evaluating tracer binding parameters using the programmable K1003 desk computer

    International Nuclear Information System (INIS)

    Sarrach, D.; Strohner, P.

    1986-01-01

    The Gauss-Newton algorithm has been used to evaluate tracer binding parameters of RIA by nonlinear regression analysis. The calculations were carried out on the K1003 desk computer. Equations for simple binding models and its derivatives are presented. The advantages of nonlinear regression analysis over linear regression are demonstrated

  14. A comparison of multivariate genome-wide association methods

    DEFF Research Database (Denmark)

    Galesloot, Tessel E; Van Steen, Kristel; Kiemeney, Lambertus A L M

    2014-01-01

    Joint association analysis of multiple traits in a genome-wide association study (GWAS), i.e. a multivariate GWAS, offers several advantages over analyzing each trait in a separate GWAS. In this study we directly compared a number of multivariate GWAS methods using simulated data. We focused on six...... methods that are implemented in the software packages PLINK, SNPTEST, MultiPhen, BIMBAM, PCHAT and TATES, and also compared them to standard univariate GWAS, analysis of the first principal component of the traits, and meta-analysis of univariate results. We simulated data (N = 1000) for three...... for scenarios with an opposite sign of genetic and residual correlation. All multivariate analyses resulted in a higher power than univariate analyses, even when only one of the traits was associated with the QTL. Hence, use of multivariate GWAS methods can be recommended, even when genetic correlations between...

  15. Regression analysis for the social sciences

    CERN Document Server

    Gordon, Rachel A

    2010-01-01

    The book provides graduate students in the social sciences with the basic skills that they need to estimate, interpret, present, and publish basic regression models using contemporary standards. Key features of the book include: interweaving the teaching of statistical concepts with examples developed for the course from publicly-available social science data or drawn from the literature. thorough integration of teaching statistical theory with teaching data processing and analysis. teaching of both SAS and Stata "side-by-side" and use of chapter exercises in which students practice programming and interpretation on the same data set and course exercises in which students can choose their own research questions and data set.

  16. Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

    Science.gov (United States)

    Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi

    2017-01-01

    High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Study of risk factors affecting both hypertension and obesity outcome by using multivariate multilevel logistic regression models

    Directory of Open Access Journals (Sweden)

    Sepedeh Gholizadeh

    2016-07-01

    Full Text Available Background:Obesity and hypertension are the most important non-communicable diseases thatin many studies, the prevalence and their risk factors have been performedin each geographic region univariately.Study of factors affecting both obesity and hypertension may have an important role which to be adrressed in this study. Materials &Methods:This cross-sectional study was conducted on 1000 men aged 20-70 living in Bushehr province. Blood pressure was measured three times and the average of them was considered as one of the response variables. Hypertension was defined as systolic blood pressure ≥140 (and-or diastolic blood pressure ≥90 and obesity was defined as body mass index ≥25. Data was analyzed by using multilevel, multivariate logistic regression model by MlwiNsoftware. Results:Intra class correlations in cluster level obtained 33% for high blood pressure and 37% for obesity, so two level model was fitted to data. The prevalence of obesity and hypertension obtained 43.6% (0.95%CI; 40.6-46.5, 29.4% (0.95%CI; 26.6-32.1 respectively. Age, gender, smoking, hyperlipidemia, diabetes, fruit and vegetable consumption and physical activity were the factors affecting blood pressure (p≤0.05. Age, gender, hyperlipidemia, diabetes, fruit and vegetable consumption, physical activity and place of residence are effective on obesity (p≤0.05. Conclusion: The multilevel models with considering levels distribution provide more precise estimates. As regards obesity and hypertension are the major risk factors for cardiovascular disease, by knowing the high-risk groups we can d careful planning to prevention of non-communicable diseases and promotion of society health.

  18. Multivariate statistical pattern recognition system for reactor noise analysis

    International Nuclear Information System (INIS)

    Gonzalez, R.C.; Howington, L.C.; Sides, W.H. Jr.; Kryter, R.C.

    1976-01-01

    A multivariate statistical pattern recognition system for reactor noise analysis was developed. The basis of the system is a transformation for decoupling correlated variables and algorithms for inferring probability density functions. The system is adaptable to a variety of statistical properties of the data, and it has learning, tracking, and updating capabilities. System design emphasizes control of the false-alarm rate. The ability of the system to learn normal patterns of reactor behavior and to recognize deviations from these patterns was evaluated by experiments at the ORNL High-Flux Isotope Reactor (HFIR). Power perturbations of less than 0.1 percent of the mean value in selected frequency ranges were detected by the system

  19. Multivariate statistical pattern recognition system for reactor noise analysis

    International Nuclear Information System (INIS)

    Gonzalez, R.C.; Howington, L.C.; Sides, W.H. Jr.; Kryter, R.C.

    1975-01-01

    A multivariate statistical pattern recognition system for reactor noise analysis was developed. The basis of the system is a transformation for decoupling correlated variables and algorithms for inferring probability density functions. The system is adaptable to a variety of statistical properties of the data, and it has learning, tracking, and updating capabilities. System design emphasizes control of the false-alarm rate. The ability of the system to learn normal patterns of reactor behavior and to recognize deviations from these patterns was evaluated by experiments at the ORNL High-Flux Isotope Reactor (HFIR). Power perturbations of less than 0.1 percent of the mean value in selected frequency ranges were detected by the system. 19 references

  20. Composite marginal quantile regression analysis for longitudinal adolescent body mass index data.

    Science.gov (United States)

    Yang, Chi-Chuan; Chen, Yi-Hau; Chang, Hsing-Yi

    2017-09-20

    Childhood and adolescenthood overweight or obesity, which may be quantified through the body mass index (BMI), is strongly associated with adult obesity and other health problems. Motivated by the child and adolescent behaviors in long-term evolution (CABLE) study, we are interested in individual, family, and school factors associated with marginal quantiles of longitudinal adolescent BMI values. We propose a new method for composite marginal quantile regression analysis for longitudinal outcome data, which performs marginal quantile regressions at multiple quantile levels simultaneously. The proposed method extends the quantile regression coefficient modeling method introduced by Frumento and Bottai (Biometrics 2016; 72:74-84) to longitudinal data accounting suitably for the correlation structure in longitudinal observations. A goodness-of-fit test for the proposed modeling is also developed. Simulation results show that the proposed method can be much more efficient than the analysis without taking correlation into account and the analysis performing separate quantile regressions at different quantile levels. The application to the longitudinal adolescent BMI data from the CABLE study demonstrates the practical utility of our proposal. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  1. Multivariate survivorship analysis using two cross-sectional samples.

    Science.gov (United States)

    Hill, M E

    1999-11-01

    As an alternative to survival analysis with longitudinal data, I introduce a method that can be applied when one observes the same cohort in two cross-sectional samples collected at different points in time. The method allows for the estimation of log-probability survivorship models that estimate the influence of multiple time-invariant factors on survival over a time interval separating two samples. This approach can be used whenever the survival process can be adequately conceptualized as an irreversible single-decrement process (e.g., mortality, the transition to first marriage among a cohort of never-married individuals). Using data from the Integrated Public Use Microdata Series (Ruggles and Sobek 1997), I illustrate the multivariate method through an investigation of the effects of race, parity, and educational attainment on the survival of older women in the United States.

  2. Research on Influence and Prediction Model of Urban Traffic Link Tunnel curvature on Fire Temperature Based on Pyrosim--SPSS Multiple Regression Analysis

    Science.gov (United States)

    Li, Xiao Ju; Yao, Kun; Dai, Jun Yu; Song, Yun Long

    2018-05-01

    The underground space, also known as the “fourth dimension” of the city, reflects the efficient use of urban development intensive. Urban traffic link tunnel is a typical underground limited-length space. Due to the geographical location, the special structure of space and the curvature of the tunnel, high-temperature smoke can easily form the phenomenon of “smoke turning” and the fire risk is extremely high. This paper takes an urban traffic link tunnel as an example to focus on the relationship between curvature and the temperature near the fire source, and use the pyrosim built different curvature fire model to analyze the influence of curvature on the temperature of the fire, then using SPSS Multivariate regression analysis simulate curvature of the tunnel and fire temperature data. Finally, a prediction model of urban traffic link tunnel curvature on fire temperature was proposed. The regression model analysis and test show that the curvature is negatively correlated with the tunnel temperature. This model is feasible and can provide a theoretical reference for the urban traffic link tunnel fire protection design and the preparation of the evacuation plan. And also, it provides some reference for other related curved tunnel curvature design and smoke control measures.

  3. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution

    Science.gov (United States)

    Kisi, Ozgur; Parmar, Kulwinder Singh

    2016-03-01

    This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.

  4. Treating experimental data of inverse kinetic method by unitary linear regression analysis

    International Nuclear Information System (INIS)

    Zhao Yusen; Chen Xiaoliang

    2009-01-01

    The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)

  5. Regression analysis of informative current status data with the additive hazards model.

    Science.gov (United States)

    Zhao, Shishun; Hu, Tao; Ma, Ling; Wang, Peijie; Sun, Jianguo

    2015-04-01

    This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided.

  6. Application of instrumental neutron activation analysis and multivariate statistical methods to archaeological Syrian ceramics

    International Nuclear Information System (INIS)

    Bakraji, E. H.; Othman, I.; Sarhil, A.; Al-Somel, N.

    2002-01-01

    Instrumental neutron activation analysis (INAA) has been utilized in the analysis of thirty-seven archaeological ceramics fragment samples collected from Tal AI-Wardiate site, Missiaf town, Hamma city, Syria. 36 chemical elements were determined. These elemental concentrations have been processed using two multivariate statistical methods, cluster and factor analysis in order to determine similarities and correlation between the various samples. Factor analysis confirms that samples were correctly classified by cluster analysis. The results showed that samples can be considered to be manufactured using three different sources of raw material. (author)

  7. A Skew-t space-varying regression model for the spectral analysis of resting state brain activity.

    Science.gov (United States)

    Ismail, Salimah; Sun, Wenqi; Nathoo, Farouk S; Babul, Arif; Moiseev, Alexader; Beg, Mirza Faisal; Virji-Babul, Naznin

    2013-08-01

    It is known that in many neurological disorders such as Down syndrome, main brain rhythms shift their frequencies slightly, and characterizing the spatial distribution of these shifts is of interest. This article reports on the development of a Skew-t mixed model for the spatial analysis of resting state brain activity in healthy controls and individuals with Down syndrome. Time series of oscillatory brain activity are recorded using magnetoencephalography, and spectral summaries are examined at multiple sensor locations across the scalp. We focus on the mean frequency of the power spectral density, and use space-varying regression to examine associations with age, gender and Down syndrome across several scalp regions. Spatial smoothing priors are incorporated based on a multivariate Markov random field, and the markedly non-Gaussian nature of the spectral response variable is accommodated by the use of a Skew-t distribution. A range of models representing different assumptions on the association structure and response distribution are examined, and we conduct model selection using the deviance information criterion. (1) Our analysis suggests region-specific differences between healthy controls and individuals with Down syndrome, particularly in the left and right temporal regions, and produces smoothed maps indicating the scalp topography of the estimated differences.

  8. Multivariate analysis of heavy metal contamination using river sediment cores of Nankan River, northern Taiwan

    Science.gov (United States)

    Lee, An-Sheng; Lu, Wei-Li; Huang, Jyh-Jaan; Chang, Queenie; Wei, Kuo-Yen; Lin, Chin-Jung; Liou, Sofia Ya Hsuan

    2016-04-01

    Through the geology and climate characteristic in Taiwan, generally rivers carry a lot of suspended particles. After these particles settled, they become sediments which are good sorbent for heavy metals in river system. Consequently, sediments can be found recording contamination footprint at low flow energy region, such as estuary. Seven sediment cores were collected along Nankan River, northern Taiwan, which is seriously contaminated by factory, household and agriculture input. Physico-chemical properties of these cores were derived from Itrax-XRF Core Scanner and grain size analysis. In order to interpret these complex data matrices, the multivariate statistical techniques (cluster analysis, factor analysis and discriminant analysis) were introduced to this study. Through the statistical determination, the result indicates four types of sediment. One of them represents contamination event which shows high concentration of Cu, Zn, Pb, Ni and Fe, and low concentration of Si and Zr. Furthermore, three possible contamination sources of this type of sediment were revealed by Factor Analysis. The combination of sediment analysis and multivariate statistical techniques used provides new insights into the contamination depositional history of Nankan River and could be similarly applied to other river systems to determine the scale of anthropogenic contamination.

  9. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  10. MULGRES: a computer program for stepwise multiple regression analysis

    Science.gov (United States)

    A. Jeff Martin

    1971-01-01

    MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.

  11. Real-time regression analysis with deep convolutional neural networks

    OpenAIRE

    Huerta, E. A.; George, Daniel; Zhao, Zhizhen; Allen, Gabrielle

    2018-01-01

    We discuss the development of novel deep learning algorithms to enable real-time regression analysis for time series data. We showcase the application of this new method with a timely case study, and then discuss the applicability of this approach to tackle similar challenges across science domains.

  12. [Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

    Science.gov (United States)

    Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

    2017-05-10

    We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.

  13. Predicting biomaterial property-dendritic cell phenotype relationships from the multivariate analysis of responses to polymethacrylates

    Science.gov (United States)

    Kou, Peng Meng; Pallassana, Narayanan; Bowden, Rebeca; Cunningham, Barry; Joy, Abraham; Kohn, Joachim; Babensee, Julia E.

    2011-01-01

    Dendritic cells (DCs) play a critical role in orchestrating the host responses to a wide variety of foreign antigens and are essential in maintaining immune tolerance. Distinct biomaterials have been shown to differentially affect the phenotype of DCs, which suggested that biomaterials may be used to modulate immune response towards the biologic component in combination products. The elucidation of biomaterial property-DC phenotype relationships is expected to inform rational design of immuno-modulatory biomaterials. In this study, DC response to a set of 12 polymethacrylates (pMAs) was assessed in terms of surface marker expression and cytokine profile. Principal component analysis (PCA) determined that surface carbon correlated with enhanced DC maturation, while surface oxygen was associated with an immature DC phenotype. Partial square linear regression, a multivariate modeling approach, was implemented and successfully predicted biomaterial-induced DC phenotype in terms of surface marker expression from biomaterial properties with R2prediction = 0.76. Furthermore, prediction of DC phenotype was effective based on only theoretical chemical composition of the bulk polymers with R2prediction = 0.80. These results demonstrated that immune cell response can be predicted from biomaterial properties, and computational models will expedite future biomaterial design and selection. PMID:22136715

  14. Copula Regression Analysis of Simultaneously Recorded Frontal Eye Field and Inferotemporal Spiking Activity during Object-Based Working Memory

    Science.gov (United States)

    Hu, Meng; Clark, Kelsey L.; Gong, Xiajing; Noudoost, Behrad; Li, Mingyao; Moore, Tirin

    2015-01-01

    Inferotemporal (IT) neurons are known to exhibit persistent, stimulus-selective activity during the delay period of object-based working memory tasks. Frontal eye field (FEF) neurons show robust, spatially selective delay period activity during memory-guided saccade tasks. We present a copula regression paradigm to examine neural interaction of these two types of signals between areas IT and FEF of the monkey during a working memory task. This paradigm is based on copula models that can account for both marginal distribution over spiking activity of individual neurons within each area and joint distribution over ensemble activity of neurons between areas. Considering the popular GLMs as marginal models, we developed a general and flexible likelihood framework that uses the copula to integrate separate GLMs into a joint regression analysis. Such joint analysis essentially leads to a multivariate analog of the marginal GLM theory and hence efficient model estimation. In addition, we show that Granger causality between spike trains can be readily assessed via the likelihood ratio statistic. The performance of this method is validated by extensive simulations, and compared favorably to the widely used GLMs. When applied to spiking activity of simultaneously recorded FEF and IT neurons during working memory task, we observed significant Granger causality influence from FEF to IT, but not in the opposite direction, suggesting the role of the FEF in the selection and retention of visual information during working memory. The copula model has the potential to provide unique neurophysiological insights about network properties of the brain. PMID:26063909

  15. A method of signal transmission path analysis for multivariate random processes

    International Nuclear Information System (INIS)

    Oguma, Ritsuo

    1984-04-01

    A method for noise analysis called ''STP (signal transmission path) analysis'' is presentd as a tool to identify noise sources and their propagation paths in multivariate random proceses. Basic idea of the analysis is to identify, via time series analysis, effective network for the signal power transmission among variables in the system and to make use of its information to the noise analysis. In the present paper, we accomplish this through two steps of signal processings; first, we estimate, using noise power contribution analysis, variables which have large contribution to the power spectrum of interest, and then evaluate the STPs for each pair of variables to identify STPs which play significant role for the generated noise to transmit to the variable under evaluation. The latter part of the analysis is executed through comparison of partial coherence function and newly introduced partial noise power contribution function. This paper presents the procedure of the STP analysis and demonstrates, using simulation data as well as Borssele PWR noise data, its effectiveness for investigation of noise generation and propagation mechanisms. (author)

  16. Predictors of success after laparoscopic gastric bypass: a multivariate analysis of socioeconomic factors.

    Science.gov (United States)

    Lutfi, R; Torquati, A; Sekhar, N; Richards, W O

    2006-06-01

    Laparoscopic gastric bypass (LGB) has proven efficacy in causing significant and durable weight loss. However, the degree of postoperative weight loss and metabolic improvement varies greatly among individuals. Our study is aimed to identify independent predictors of successful weight loss after LGB. Socioeconomic demographics were prospectively collected on patients undergoing LGB. Primary endpoint was percent of excess weight loss (EWL) at 1-year follow-up. Insufficient weight loss was defined as EWL or=52.8%. According to this definition, 147 patients (81.7%) achieved successful weight loss 1 year after LGB. On univariate analysis, preoperative BMI had a significant effect on EWL, with patients with BMI vs 61.6%; p = 0.001). Marriage status was also a significant predictor of successful outcome, with single patients achieving a higher percentage of EWL than married patients (89.8% vs 77.7%; p = 0.04). Race had a noticeable but not statistically significant effect, with Caucasian patients achieving a higher percentage of EWL than African Americans (82.9% vs 60%; p = 0.06). Marital status remained an independent predictor of success in the multivariate logistic regression model after adjusting for covariates. Married patients were at more than two times the risk of failure compared to those who were unmarried (OR 2.6; 95% CI: 1.1-6.5, p = 0.04). Weight loss achieved at 1 year after LGB is suboptimal in superobese patients. Single patients with BMI < 50 had the best chance of achieving greater weight loss.

  17. Biodegradable blends of starch/polyvinyl alcohol/glycerol: multivariate analysis of the mechanical properties

    Directory of Open Access Journals (Sweden)

    Juliano Zanela

    Full Text Available Abstract The aim of the work was to study the mechanical properties of extruded starch/polyvinyl alcohol (PVA/glycerol biodegradable blends using multivariate analysis. The blends were produced as cylindrical strands by extrusion using PVAs with different hydrolysis degrees and viscosities, at two extrusion temperature profiles (90/170/170/170/170 °C and 90/170/200/200/200 °C and three conditioning relative humidities of the samples (33, 53, and 75%. The mechanical properties showed a great variability according to PVA type, as well as the extrusion temperature profile and the conditioning relative humidity; the tensile strength ranged from 0.42 to 5.40 MPa, elongation at break ranged from 10 to 404% and Young’s modulus ranged from 0.93 to 13.81 MPa. The multivariate analysis was a useful methodology to study the mechanical properties behavior of starch/PVA/glycerol blends, and it can be used as an exploratory technique to select of the more suitable PVA type and extrusion temperature to produce biodegradable materials.

  18. The evolution of GDP in USA using cyclic regression analysis

    OpenAIRE

    Catalin Angelo IOAN; Gina IOAN

    2013-01-01

    Based on the four major types of economic cycles (Kondratieff, Juglar, Kitchin, Kuznet), the paper aims to determine their actual length (for the U.S. economy) using cyclic regressions based on Fourier analysis.

  19. Dynamic factor analysis in the frequency domain: causal modeling of multivariate psychophysiological time series

    NARCIS (Netherlands)

    Molenaar, P.C.M.

    1987-01-01

    Outlines a frequency domain analysis of the dynamic factor model and proposes a solution to the problem of constructing a causal filter of lagged factor loadings. The method is illustrated with applications to simulated and real multivariate time series. The latter applications involve topographic

  20. Principal response curves: analysis of time-dependent multivariate responses of biological community to stress

    NARCIS (Netherlands)

    Brink, van den P.J.; Braak, ter C.J.F.

    1999-01-01

    In this paper a novel multivariate method is proposed for the analysis of community response data from designed experiments repeatedly sampled in time. The long-term effects of the insecticide chlorpyrifos on the invertebrate community and the dissolved oxygen (DO)–pH–alkalinity–conductivity

  1. Determinants of LSIL Regression in Women from a Colombian Cohort

    International Nuclear Information System (INIS)

    Molano, Monica; Gonzalez, Mauricio; Gamboa, Oscar; Ortiz, Natasha; Luna, Joaquin; Hernandez, Gustavo; Posso, Hector; Murillo, Raul; Munoz, Nubia

    2010-01-01

    Objective: To analyze the role of Human Papillomavirus (HPV) and other risk factors in the regression of cervical lesions in women from the Bogota Cohort. Methods: 200 HPV positive women with abnormal cytology were included for regression analysis. The time of lesion regression was modeled using methods for interval censored survival time data. Median duration of total follow-up was 9 years. Results: 80 (40%) women were diagnosed with Atypical Squamous Cells of Undetermined Significance (ASCUS) or Atypical Glandular Cells of Undetermined Significance (AGUS) while 120 (60%) were diagnosed with Low Grade Squamous Intra-epithelial Lesions (LSIL). Globally, 40% of the lesions were still present at first year of follow up, while 1.5% was still present at 5 year check-up. The multivariate model showed similar regression rates for lesions in women with ASCUS/AGUS and women with LSIL (HR= 0.82, 95% CI 0.59-1.12). Women infected with HR HPV types and those with mixed infections had lower regression rates for lesions than did women infected with LR types (HR=0.526, 95% CI 0.33-0.84, for HR types and HR=0.378, 95% CI 0.20-0.69, for mixed infections). Furthermore, women over 30 years had a higher lesion regression rate than did women under 30 years (HR1.53, 95% CI 1.03-2.27). The study showed that the median time for lesion regression was 9 months while the median time for HPV clearance was 12 months. Conclusions: In the studied population, the type of infection and the age of the women are critical factors for the regression of cervical lesions.

  2. Quantile regression for the statistical analysis of immunological data with many non-detects.

    Science.gov (United States)

    Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

    2012-07-07

    Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

  3. Optimal choice of basis functions in the linear regression analysis

    International Nuclear Information System (INIS)

    Khotinskij, A.M.

    1988-01-01

    Problem of optimal choice of basis functions in the linear regression analysis is investigated. Step algorithm with estimation of its efficiency, which holds true at finite number of measurements, is suggested. Conditions, providing the probability of correct choice close to 1 are formulated. Application of the step algorithm to analysis of decay curves is substantiated. 8 refs

  4. Adjustment of geochemical background by robust multivariate statistics

    Science.gov (United States)

    Zhou, D.

    1985-01-01

    Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.

  5. [Use of multiple regression models in observational studies (1970-2013) and requirements of the STROBE guidelines in Spanish scientific journals].

    Science.gov (United States)

    Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M

    In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.

  6. Morphological analysis of enlarged ventricle on CT image, using multivariate analysis

    International Nuclear Information System (INIS)

    Iwasaki, Satoru; Kichikawa, Kimihiko; Otsuji, Hideyuki; Fukusumi, Akio; Kobayashi, Yasuo.

    1983-01-01

    Multivariate analysis of enlarged cerebral ventricle on CT was undertaken to study the characteristics of ventricular morphology. Several ventricular segments of enlarged ventricle, defined on the basis of the study of normal group, were linearly measured on CT image. Then the discriminant analysis with the increase and decrease of variable was applied. The following are the results obtained. The error ratio of discrimination between pressure hydrocephalus and cerebral atrophy was 8.4 %, and between obstructive hydrocephalus and communicating hydrocephalus was 11.3 %. Ventricular segments were divided into three groups according to their character of enlargement: (1) the temporal horn and trigone are large in pressure hydrocephalus; (2) the hypothalamic segment of the third ventricle and the body of lateral ventricle are larger in obstructive hydrocephalus than in communicating hydrocephalus; (3) the anterior horn, cellae mediae at the level of the head of caudate nuclei and thalamic segment of the third ventricle are relatively large in cerebral atrophy and communicating hydrocephalus. The hypothalamic segment of the third ventricle assumes a round or oval shape in pressure hydrocephalus but a rectangular or teardrop shape in cerebral atrophy. These findings are contributory to pathological evaluation of ventricular enlargement. (author)

  7. Multivariate analysis for customer segmentation based on RFM

    Directory of Open Access Journals (Sweden)

    Álvaro Julio Cuadros López

    2018-02-01

    Full Text Available Context: To build a successful relationship management (CRM, companies must start with the identification of the true value of customers, as this provides basic information to implement more targeted and customized marketing strategies. The RFM methodology, a classic analysis tool that uses three evaluation parameters, allows companies to understand customer behavior, and to establish customer segments. The addition of a new parameter in the traditional technique is an opportunity to refine the possible outcomes of a customer segmentation since it not only provides a new element of evaluation to identify the most valuable customers, but it also makes it possible to differentiate and get to know customers even better. Method: The article presents a methodology that allows to establish customer segments using an extended RFM method with new variables, selected through multivariate analysis..  Results: The proposed implementation was applied in a company in which variables such as profit, profit percentage, and billing due date were tested. Therefore, it was possible to establish a more detailed customer segmentation than with the classic RFM. Conclusions: the RFM analysis is a method widely used in the industry for its easy understanding and applicability. However, it can be improved with the use of statistical procedures and new variables, which will allow companies to have deeper information about the behavior of the clients, and will facilitate the design of specific marketing strategies.

  8. Multivariate pattern analysis of MEG and EEG: A comparison of representational structure in time and space.

    Science.gov (United States)

    Cichy, Radoslaw Martin; Pantazis, Dimitrios

    2017-09-01

    Multivariate pattern analysis of magnetoencephalography (MEG) and electroencephalography (EEG) data can reveal the rapid neural dynamics underlying cognition. However, MEG and EEG have systematic differences in sampling neural activity. This poses the question to which degree such measurement differences consistently bias the results of multivariate analysis applied to MEG and EEG activation patterns. To investigate, we conducted a concurrent MEG/EEG study while participants viewed images of everyday objects. We applied multivariate classification analyses to MEG and EEG data, and compared the resulting time courses to each other, and to fMRI data for an independent evaluation in space. We found that both MEG and EEG revealed the millisecond spatio-temporal dynamics of visual processing with largely equivalent results. Beyond yielding convergent results, we found that MEG and EEG also captured partly unique aspects of visual representations. Those unique components emerged earlier in time for MEG than for EEG. Identifying the sources of those unique components with fMRI, we found the locus for both MEG and EEG in high-level visual cortex, and in addition for MEG in low-level visual cortex. Together, our results show that multivariate analyses of MEG and EEG data offer a convergent and complimentary view on neural processing, and motivate the wider adoption of these methods in both MEG and EEG research. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Multivariate Analysis Approach to the Serum Peptide Profile of Morbidly Obese Patients

    Directory of Open Access Journals (Sweden)

    M. Agostini

    2013-01-01

    Full Text Available Background: Obesity is currently epidemic in many countries worldwide and is strongly related to diabetes and cardiovascular disease. Mass spectrometry, in particular matrix-assisted laser desorption/ionization time of flight (MALDI-TOF is currently used for detecting different pattern of expressed protein. This study investigated the differences in low molecular weight (LMW peptide profiles between obese and normal-weight subjects in combination with multivariate statistical analysis.

  10. Regression analysis for the social sciences

    CERN Document Server

    Gordon, Rachel A

    2015-01-01

    Provides graduate students in the social sciences with the basic skills they need to estimate, interpret, present, and publish basic regression models using contemporary standards. Key features of the book include: interweaving the teaching of statistical concepts with examples developed for the course from publicly-available social science data or drawn from the literature. thorough integration of teaching statistical theory with teaching data processing and analysis. teaching of Stata and use of chapter exercises in which students practice programming and interpretation on the same data set. A separate set of exercises allows students to select a data set to apply the concepts learned in each chapter to a research question of interest to them, all updated for this edition.

  11. Probing beer aging chemistry by nuclear magnetic resonance and multivariate analysis

    Energy Technology Data Exchange (ETDEWEB)

    Rodrigues, J.A. [CICECO-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Barros, A.S. [QOPNA-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Carvalho, B.; Brandao, T. [UNICER, Bebidas de Portugal, Leca do Balio, 4466-955, S. Mamede de Infesta (Portugal); Gil, Ana M., E-mail: agil@ua.pt [CICECO-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal)

    2011-09-30

    Graphical abstract: The use of nuclear magnetic resonance (NMR) metabonomics for monitoring the chemical changes occurring in beer exposed to forced aging (at 45 deg. C for up to 18 days) is described. Both principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) were applied to the NMR spectra of beer recorded as a function of aging and an aging trend was observed. Inspection of PLS-DA loadings and peak integration revealed the importance of well known markers (e.g. 5-HMF) as well as of other compounds: amino acids, higher alcohols, organic acids, dextrins and some still unassigned spin systems. 2D correlation analysis enabled relevant compound variations to be confirmed and inter-compound correlations to be assessed, thus offering improved insight into the chemical aspects of beer aging. Highlights: {center_dot} Use of NMR metabonomics for monitoring the chemical changes occurring in beer exposed to forced aging. {center_dot} Compositional variations evaluated by principal component analysis and partial least squares-discriminant analysis. {center_dot} Results reveal importance of known markers and other compounds: amino and organic acids, higher alcohols, dextrins. {center_dot} 2D correlation analysis reveals inter-compound relationships, offering insight into beer aging chemistry. - Abstract: This paper describes the use of nuclear magnetic resonance (NMR) spectroscopy, in tandem with multivariate analysis (MVA), for monitoring the chemical changes occurring in a lager beer exposed to forced aging (at 45 deg. C for up to 18 days). To evaluate the resulting compositional variations, both principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) were applied to the NMR spectra of beer recorded as a function of aging and a clear aging trend was observed. Inspection of PLS-DA loadings and peak integration enabled the changing compounds to be identified, revealing the importance of well known

  12. Probing beer aging chemistry by nuclear magnetic resonance and multivariate analysis

    International Nuclear Information System (INIS)

    Rodrigues, J.A.; Barros, A.S.; Carvalho, B.; Brandao, T.; Gil, Ana M.

    2011-01-01

    Graphical abstract: The use of nuclear magnetic resonance (NMR) metabonomics for monitoring the chemical changes occurring in beer exposed to forced aging (at 45 deg. C for up to 18 days) is described. Both principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) were applied to the NMR spectra of beer recorded as a function of aging and an aging trend was observed. Inspection of PLS-DA loadings and peak integration revealed the importance of well known markers (e.g. 5-HMF) as well as of other compounds: amino acids, higher alcohols, organic acids, dextrins and some still unassigned spin systems. 2D correlation analysis enabled relevant compound variations to be confirmed and inter-compound correlations to be assessed, thus offering improved insight into the chemical aspects of beer aging. Highlights: · Use of NMR metabonomics for monitoring the chemical changes occurring in beer exposed to forced aging. · Compositional variations evaluated by principal component analysis and partial least squares-discriminant analysis. · Results reveal importance of known markers and other compounds: amino and organic acids, higher alcohols, dextrins. · 2D correlation analysis reveals inter-compound relationships, offering insight into beer aging chemistry. - Abstract: This paper describes the use of nuclear magnetic resonance (NMR) spectroscopy, in tandem with multivariate analysis (MVA), for monitoring the chemical changes occurring in a lager beer exposed to forced aging (at 45 deg. C for up to 18 days). To evaluate the resulting compositional variations, both principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) were applied to the NMR spectra of beer recorded as a function of aging and a clear aging trend was observed. Inspection of PLS-DA loadings and peak integration enabled the changing compounds to be identified, revealing the importance of well known markers such as 5-hydroxymethylfurfural (5

  13. Theory of net analyte signal vectors in inverse regression

    DEFF Research Database (Denmark)

    Bro, R.; Andersen, Charlotte Møller

    2003-01-01

    The. net analyte signal and the net analyte signal vector are useful measures in building and optimizing multivariate calibration models. In this paper a theory for their use in inverse regression is developed. The theory of net analyte signal was originally derived from classical least squares...

  14. Synthetic environmental indicators: A conceptual approach from the multivariate statistics

    International Nuclear Information System (INIS)

    Escobar J, Luis A

    2008-01-01

    This paper presents a general description of multivariate statistical analysis and shows two methodologies: analysis of principal components and analysis of distance, DP2. Both methods use techniques of multivariate analysis to define the true dimension of data, which is useful to estimate indicators of environmental quality.

  15. A Tool for Classifying Individuals with Chronic Back Pain: Using Multivariate Pattern Analysis with Functional Magnetic Resonance Imaging Data

    Science.gov (United States)

    Callan, Daniel; Mills, Lloyd; Nott, Connie; England, Robert; England, Shaun

    2014-01-01

    Chronic pain is one of the most prevalent health problems in the world today, yet neurological markers, critical to diagnosis of chronic pain, are still largely unknown. The ability to objectively identify individuals with chronic pain using functional magnetic resonance imaging (fMRI) data is important for the advancement of diagnosis, treatment, and theoretical knowledge of brain processes associated with chronic pain. The purpose of our research is to investigate specific neurological markers that could be used to diagnose individuals experiencing chronic pain by using multivariate pattern analysis with fMRI data. We hypothesize that individuals with chronic pain have different patterns of brain activity in response to induced pain. This pattern can be used to classify the presence or absence of chronic pain. The fMRI experiment consisted of alternating 14 seconds of painful electric stimulation (applied to the lower back) with 14 seconds of rest. We analyzed contrast fMRI images in stimulation versus rest in pain-related brain regions to distinguish between the groups of participants: 1) chronic pain and 2) normal controls. We employed supervised machine learning techniques, specifically sparse logistic regression, to train a classifier based on these contrast images using a leave-one-out cross-validation procedure. We correctly classified 92.3% of the chronic pain group (N = 13) and 92.3% of the normal control group (N = 13) by recognizing multivariate patterns of activity in the somatosensory and inferior parietal cortex. This technique demonstrates that differences in the pattern of brain activity to induced pain can be used as a neurological marker to distinguish between individuals with and without chronic pain. Medical, legal and business professionals have recognized the importance of this research topic and of developing objective measures of chronic pain. This method of data analysis was very successful in correctly classifying each of the two

  16. A tool for classifying individuals with chronic back pain: using multivariate pattern analysis with functional magnetic resonance imaging data.

    Directory of Open Access Journals (Sweden)

    Daniel Callan

    Full Text Available Chronic pain is one of the most prevalent health problems in the world today, yet neurological markers, critical to diagnosis of chronic pain, are still largely unknown. The ability to objectively identify individuals with chronic pain using functional magnetic resonance imaging (fMRI data is important for the advancement of diagnosis, treatment, and theoretical knowledge of brain processes associated with chronic pain. The purpose of our research is to investigate specific neurological markers that could be used to diagnose individuals experiencing chronic pain by using multivariate pattern analysis with fMRI data. We hypothesize that individuals with chronic pain have different patterns of brain activity in response to induced pain. This pattern can be used to classify the presence or absence of chronic pain. The fMRI experiment consisted of alternating 14 seconds of painful electric stimulation (applied to the lower back with 14 seconds of rest. We analyzed contrast fMRI images in stimulation versus rest in pain-related brain regions to distinguish between the groups of participants: 1 chronic pain and 2 normal controls. We employed supervised machine learning techniques, specifically sparse logistic regression, to train a classifier based on these contrast images using a leave-one-out cross-validation procedure. We correctly classified 92.3% of the chronic pain group (N = 13 and 92.3% of the normal control group (N = 13 by recognizing multivariate patterns of activity in the somatosensory and inferior parietal cortex. This technique demonstrates that differences in the pattern of brain activity to induced pain can be used as a neurological marker to distinguish between individuals with and without chronic pain. Medical, legal and business professionals have recognized the importance of this research topic and of developing objective measures of chronic pain. This method of data analysis was very successful in correctly classifying

  17. Motivation and Self-Regulated Learning: A Multivariate Multilevel Analysis

    Directory of Open Access Journals (Sweden)

    Wondimu Ahmed

    2017-09-01

    Full Text Available This study investigated the relationship between motivation and self-regulated learning (SRL in a nationally representative sample of 5245, 15-year-old students in the USA. A multivariate multilevel analysis was conducted to examine the role of three motivational variables (self-efficacy, intrinsic value & instrumental value in predicting three SRL strategies (memorization, elaboration & control. The results showed that compared to self-efficacy, intrinsic value and instrumental value of math were stronger predictors of memorization, elaboration and control strategies. None of the motivational variables had a stronger effect on one strategy than the other. The findings suggest that the development of self-regulatory skills in math can be greatly enhanced by helping students develop positive value of and realistic expectancy for success in math.

  18. Nonparametric regression using the concept of minimum energy

    International Nuclear Information System (INIS)

    Williams, Mike

    2011-01-01

    It has recently been shown that an unbinned distance-based statistic, the energy, can be used to construct an extremely powerful nonparametric multivariate two sample goodness-of-fit test. An extension to this method that makes it possible to perform nonparametric regression using multiple multivariate data sets is presented in this paper. The technique, which is based on the concept of minimizing the energy of the system, permits determination of parameters of interest without the need for parametric expressions of the parent distributions of the data sets. The application and performance of this new method is discussed in the context of some simple example analyses.

  19. Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

    Science.gov (United States)

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-06-01

    Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Statistical mixture design and multivariate analysis of inkjet printed a-WO3/TiO2/WOX electrochromic films.

    Science.gov (United States)

    Wojcik, Pawel Jerzy; Pereira, Luís; Martins, Rodrigo; Fortunato, Elvira

    2014-01-13

    An efficient mathematical strategy in the field of solution processed electrochromic (EC) films is outlined as a combination of an experimental work, modeling, and information extraction from massive computational data via statistical software. Design of Experiment (DOE) was used for statistical multivariate analysis and prediction of mixtures through a multiple regression model, as well as the optimization of a five-component sol-gel precursor subjected to complex constraints. This approach significantly reduces the number of experiments to be realized, from 162 in the full factorial (L=3) and 72 in the extreme vertices (D=2) approach down to only 30 runs, while still maintaining a high accuracy of the analysis. By carrying out a finite number of experiments, the empirical modeling in this study shows reasonably good prediction ability in terms of the overall EC performance. An optimized ink formulation was employed in a prototype of a passive EC matrix fabricated in order to test and trial this optically active material system together with a solid-state electrolyte for the prospective application in EC displays. Coupling of DOE with chromogenic material formulation shows the potential to maximize the capabilities of these systems and ensures increased productivity in many potential solution-processed electrochemical applications.