MIDAS: Regionally linear multivariate discriminative statistical mapping.
Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos
2018-07-01
Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the
Multivariate statistical modelling based on generalized linear models
Fahrmeir, Ludwig
1994-01-01
This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...
Multivariate Statistical Process Control
Kulahci, Murat
2013-01-01
As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control (SPC) and monitoring for which the aim...... is to identify “out-of-control” state of a process using control charts in order to reduce the excessive variation caused by so-called assignable causes. In practice, the most common method of monitoring multivariate data is through a statistic akin to the Hotelling’s T2. For high dimensional data with excessive...... amount of cross correlation, practitioners are often recommended to use latent structures methods such as Principal Component Analysis to summarize the data in only a few linear combinations of the original variables that capture most of the variation in the data. Applications of these control charts...
The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...
Applied multivariate statistics with R
Zelterman, Daniel
2015-01-01
This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...
Applied multivariate statistical analysis
Härdle, Wolfgang Karl
2015-01-01
Focusing on high-dimensional applications, this 4th edition presents the tools and concepts used in multivariate data analysis in a style that is also accessible for non-mathematicians and practitioners. It surveys the basic principles and emphasizes both exploratory and inferential statistics; a new chapter on Variable Selection (Lasso, SCAD and Elastic Net) has also been added. All chapters include practical exercises that highlight applications in different multivariate data analysis fields: in quantitative financial studies, where the joint dynamics of assets are observed; in medicine, where recorded observations of subjects in different locations form the basis for reliable diagnoses and medication; and in quantitative marketing, where consumers’ preferences are collected in order to construct models of consumer behavior. All of these examples involve high to ultra-high dimensions and represent a number of major fields in big data analysis. The fourth edition of this book on Applied Multivariate ...
Multivariate covariance generalized linear models
Bonat, W. H.; Jørgensen, Bent
2016-01-01
are fitted by using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of types of response variables and covariance structures, including multivariate extensions......We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link...... function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated...
A primer of multivariate statistics
Harris, Richard J
2014-01-01
Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why
Sparse Linear Identifiable Multivariate Modeling
Henao, Ricardo; Winther, Ole
2011-01-01
and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable......In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully...
Aspects of multivariate statistical theory
Muirhead, Robb J
2009-01-01
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "". . . the wealth of material on statistics concerning the multivariate normal distribution is quite exceptional. As such it is a very useful source of information for the general statistician and a must for anyone wanting to pen
Multivariate Statistical Process Control Charts: An Overview
Bersimis, Sotiris; Psarakis, Stelios; Panaretos, John
2006-01-01
In this paper we discuss the basic procedures for the implementation of multivariate statistical process control via control charting. Furthermore, we review multivariate extensions for all kinds of univariate control charts, such as multivariate Shewhart-type control charts, multivariate CUSUM control charts and multivariate EWMA control charts. In addition, we review unique procedures for the construction of multivariate control charts, based on multivariate statistical techniques such as p...
Multivariate statistical methods a primer
Manly, Bryan FJ
2004-01-01
THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o
Multivariate statistics exercises and solutions
Härdle, Wolfgang Karl
2015-01-01
The authors present tools and concepts of multivariate data analysis by means of exercises and their solutions. The first part is devoted to graphical techniques. The second part deals with multivariate random variables and presents the derivation of estimators and tests for various practical situations. The last part introduces a wide variety of exercises in applied multivariate data analysis. The book demonstrates the application of simple calculus and basic multivariate methods in real life situations. It contains altogether more than 250 solved exercises which can assist a university teacher in setting up a modern multivariate analysis course. All computer-based exercises are available in the R language. All R codes and data sets may be downloaded via the quantlet download center www.quantlet.org or via the Springer webpage. For interactive display of low-dimensional projections of a multivariate data set, we recommend GGobi.
Multivariate generalized linear mixed models using R
Berridge, Damon Mark
2011-01-01
Multivariate Generalized Linear Mixed Models Using R presents robust and methodologically sound models for analyzing large and complex data sets, enabling readers to answer increasingly complex research questions. The book applies the principles of modeling to longitudinal data from panel and related studies via the Sabre software package in R. A Unified Framework for a Broad Class of Models The authors first discuss members of the family of generalized linear models, gradually adding complexity to the modeling framework by incorporating random effects. After reviewing the generalized linear model notation, they illustrate a range of random effects models, including three-level, multivariate, endpoint, event history, and state dependence models. They estimate the multivariate generalized linear mixed models (MGLMMs) using either standard or adaptive Gaussian quadrature. The authors also compare two-level fixed and random effects linear models. The appendices contain additional information on quadrature, model...
Multivariate statistical methods a first course
Marcoulides, George A
2014-01-01
Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Matrix Tricks for Linear Statistical Models
Puntanen, Simo; Styan, George PH
2011-01-01
In teaching linear statistical models to first-year graduate students or to final-year undergraduate students there is no way to proceed smoothly without matrices and related concepts of linear algebra; their use is really essential. Our experience is that making some particular matrix tricks very familiar to students can substantially increase their insight into linear statistical models (and also multivariate statistical analysis). In matrix algebra, there are handy, sometimes even very simple "tricks" which simplify and clarify the treatment of a problem - both for the student and
Multivariate statistical assessment of coal properties
Klika, Z.; Serenčíšová, J.; Kožušníková, Alena; Kolomazník, I.; Študentová, S.; Vontorová, J.
2014-01-01
Roč. 128, č. 128 (2014), s. 119-127 ISSN 0378-3820 R&D Projects: GA MŠk ED2.1.00/03.0082 Institutional support: RVO:68145535 Keywords : coal properties * structural,chemical and petrographical properties * multivariate statistics Subject RIV: DH - Mining, incl. Coal Mining Impact factor: 3.352, year: 2014 http://dx.doi.org/10.1016/j.fuproc.2014.06.029
Multivariate statistical analysis of wildfires in Portugal
Costa, Ricardo; Caramelo, Liliana; Pereira, Mário
2013-04-01
Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).
Multivariate linear models and repeated measurements revisited
Dalgaard, Peter
2009-01-01
Methods for generalized analysis of variance based on multivariate normal theory have been known for many years. In a repeated measurements context, it is most often of interest to consider transformed responses, typically within-subject contrasts or averages. Efficiency considerations leads...... to sphericity assumptions, use of F tests and the Greenhouse-Geisser and Huynh-Feldt adjustments to compensate for deviations from sphericity. During a recent implementation of such methods in the R language, the general structure of such transformations was reconsidered, leading to a flexible specification...
Multivariate statistical analysis a high-dimensional approach
Serdobolskii, V
2000-01-01
In the last few decades the accumulation of large amounts of in formation in numerous applications. has stimtllated an increased in terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen ...
Bayes linear statistics, theory & methods
Goldstein, Michael
2007-01-01
Bayesian methods combine information available from data with any prior information available from expert knowledge. The Bayes linear approach follows this path, offering a quantitative structure for expressing beliefs, and systematic methods for adjusting these beliefs, given observational data. The methodology differs from the full Bayesian methodology in that it establishes simpler approaches to belief specification and analysis based around expectation judgements. Bayes Linear Statistics presents an authoritative account of this approach, explaining the foundations, theory, methodology, and practicalities of this important field. The text provides a thorough coverage of Bayes linear analysis, from the development of the basic language to the collection of algebraic results needed for efficient implementation, with detailed practical examples. The book covers:The importance of partial prior specifications for complex problems where it is difficult to supply a meaningful full prior probability specification...
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Multivariate ordination statistics workshop with R slides
Strack, Michael
2015-01-01
2-hour workshop given at Macquarie University Department of Biological Sciences, 4 November 2015. Workshop was an introduction to the family of techniques falling under multivariate ordination, using the R language and drawing heavily from the book "Numerical Ecology with R" by Borcard et. al (2012).
International Conference on Trends and Perspectives in Linear Statistical Inference
Rosen, Dietrich
2018-01-01
This volume features selected contributions on a variety of topics related to linear statistical inference. The peer-reviewed papers from the International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat 2016) held in Istanbul, Turkey, 22-25 August 2016, cover topics in both theoretical and applied statistics, such as linear models, high-dimensional statistics, computational statistics, the design of experiments, and multivariate analysis. The book is intended for statisticians, Ph.D. students, and professionals who are interested in statistical inference. .
Classification of Specialized Farms Applying Multivariate Statistical Methods
Zuzana Hloušková
2017-01-01
Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.
Multivariate statistics high-dimensional and large-sample approximations
Fujikoshi, Yasunori; Shimizu, Ryoichi
2010-01-01
A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic
Synthetic environmental indicators: A conceptual approach from the multivariate statistics
Escobar J, Luis A
2008-01-01
This paper presents a general description of multivariate statistical analysis and shows two methodologies: analysis of principal components and analysis of distance, DP2. Both methods use techniques of multivariate analysis to define the true dimension of data, which is useful to estimate indicators of environmental quality.
Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS
Sumintadireja, Prihadi; Irawan, Dasapta Erwin; Rezky, Yuanno; Gio, Prana Ugiana; Agustin, Anggita
2016-01-01
This file is the dataset for the following paper "Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS". Authors: Prihadi Sumintadireja1, Dasapta Erwin Irawan1, Yuano Rezky2, Prana Ugiana Gio3, Anggita Agustin1
Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance
Glascock, M. D.; Neff, H.; Vaughn, K. J.
2004-06-01
The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.
Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance
Glascock, M. D.; Neff, H.; Vaughn, K. J.
2004-01-01
The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.
Instrumental Neutron Activation Analysis and Multivariate Statistics for Pottery Provenance
Glascock, M. D.; Neff, H. [University of Missouri, Research Reactor Center (United States); Vaughn, K. J. [Pacific Lutheran University, Department of Anthropology (United States)
2004-06-15
The application of instrumental neutron activation analysis and multivariate statistics to archaeological studies of ceramics and clays is described. A small pottery data set from the Nasca culture in southern Peru is presented for illustration.
Multivariate statistical characterization of groundwater quality in Ain ...
Administrator
depends much on the sustainability of the available water resources. Water of .... 18 wells currently in use were selected based on the preliminary field survey carried out to ... In recent times, multivariate statistical methods have been applied ...
Multivariate methods and forecasting with IBM SPSS statistics
Aljandali, Abdulkader
2017-01-01
This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...
Statistical Inference for a Class of Multivariate Negative Binomial Distributions
Rubak, Ege H.; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...
Actuarial statistics with generalized linear mixed models
Antonio, K.; Beirlant, J.
2007-01-01
Over the last decade the use of generalized linear models (GLMs) in actuarial statistics has received a lot of attention, starting from the actuarial illustrations in the standard text by McCullagh and Nelder [McCullagh, P., Nelder, J.A., 1989. Generalized linear models. In: Monographs on Statistics
Multivariate statistical analysis of major and trace element data for ...
Multivariate statistical analysis of major and trace element data for niobium exploration in the peralkaline granites of the anorogenic ring-complex province of Nigeria. PO Ogunleye, EC Ike, I Garba. Abstract. No Abstract Available Journal of Mining and Geology Vol.40(2) 2004: 107-117. Full Text: EMAIL FULL TEXT EMAIL ...
Statistical inference for a class of multivariate negative binomial distributions
Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...
Li, Yanming; Nan, Bin; Zhu, Ji
2015-06-01
We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-01-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A. [School of Mechatronic Engineering, Universiti Malaysia Perlis, Kampus Pauh Putra, 02600 Arau, Perlis (Malaysia); Omar, O. [Malaysian Agriculture Research and Development Institute (MARDI), Persiaran MARDI-UPM, 43400 Serdang, Selangor (Malaysia)
2015-05-15
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-05-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Multivariate statistical analysis of precipitation chemistry in Northwestern Spain
Prada-Sanchez, J.M.; Garcia-Jurado, I.; Gonzalez-Manteiga, W.; Fiestras-Janeiro, M.G.; Espada-Rios, M.I.; Lucas-Dominguez, T.
1993-01-01
149 samples of rainwater were collected in the proximity of a power station in northwestern Spain at three rainwater monitoring stations. The resulting data are analyzed using multivariate statistical techniques. Firstly, the Principal Component Analysis shows that there are three main sources of pollution in the area (a marine source, a rural source and an acid source). The impact from pollution from these sources on the immediate environment of the stations is studied using Factorial Discriminant Analysis. 8 refs., 7 figs., 11 tabs
Multivariate statistical analysis of precipitation chemistry in Northwestern Spain
Prada-Sanchez, J.M.; Garcia-Jurado, I.; Gonzalez-Manteiga, W.; Fiestras-Janeiro, M.G.; Espada-Rios, M.I.; Lucas-Dominguez, T. (University of Santiago, Santiago (Spain). Faculty of Mathematics, Dept. of Statistics and Operations Research)
1993-07-01
149 samples of rainwater were collected in the proximity of a power station in northwestern Spain at three rainwater monitoring stations. The resulting data are analyzed using multivariate statistical techniques. Firstly, the Principal Component Analysis shows that there are three main sources of pollution in the area (a marine source, a rural source and an acid source). The impact from pollution from these sources on the immediate environment of the stations is studied using Factorial Discriminant Analysis. 8 refs., 7 figs., 11 tabs.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Multivariate time series with linear state space structure
Gómez, Víctor
2016-01-01
This book presents a comprehensive study of multivariate time series with linear state space structure. The emphasis is put on both the clarity of the theoretical concepts and on efficient algorithms for implementing the theory. In particular, it investigates the relationship between VARMA and state space models, including canonical forms. It also highlights the relationship between Wiener-Kolmogorov and Kalman filtering both with an infinite and a finite sample. The strength of the book also lies in the numerous algorithms included for state space models that take advantage of the recursive nature of the models. Many of these algorithms can be made robust, fast, reliable and efficient. The book is accompanied by a MATLAB package called SSMMATLAB and a webpage presenting implemented algorithms with many examples and case studies. Though it lays a solid theoretical foundation, the book also focuses on practical application, and includes exercises in each chapter. It is intended for researchers and students wor...
Multivariate statistical methods and data mining in particle physics (4/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (2/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (1/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Identification of mine waters by statistical multivariate methods
Mali, N [IGGG, Ljubljana (Slovenia)
1992-01-01
Three water-bearing aquifers are present in the Velenje lignite mine. The aquifer waters have differing chemical composition; a geochemical water analysis can therefore determine the source of mine water influx. Mine water samples from different locations in the mine were analyzed, the results of chemical content and of electric conductivity of mine water were statistically processed by means of MICROGAS, SPSS-X and IN STATPAC computer programs, which apply three multivariate statistical methods (discriminate, cluster and factor analysis). Reliability of calculated values was determined with the Kolmogorov and Smirnov tests. It is concluded that laboratory analysis of single water samples can produce measurement errors, but statistical processing of water sample data can identify origin and movement of mine water. 15 refs.
Frank, T.D.
2002-01-01
We study many particle systems in the context of mean field forces, concentration-dependent diffusion coefficients, generalized equilibrium distributions, and quantum statistics. Using kinetic transport theory and linear nonequilibrium thermodynamics we derive for these systems a generalized multivariate Fokker-Planck equation. It is shown that this Fokker-Planck equation describes relaxation processes, has stationary maximum entropy distributions, can have multiple stationary solutions and stationary solutions that differ from Boltzmann distributions
Multivariate statistical pattern recognition system for reactor noise analysis
Gonzalez, R.C.; Howington, L.C.; Sides, W.H. Jr.; Kryter, R.C.
1976-01-01
A multivariate statistical pattern recognition system for reactor noise analysis was developed. The basis of the system is a transformation for decoupling correlated variables and algorithms for inferring probability density functions. The system is adaptable to a variety of statistical properties of the data, and it has learning, tracking, and updating capabilities. System design emphasizes control of the false-alarm rate. The ability of the system to learn normal patterns of reactor behavior and to recognize deviations from these patterns was evaluated by experiments at the ORNL High-Flux Isotope Reactor (HFIR). Power perturbations of less than 0.1 percent of the mean value in selected frequency ranges were detected by the system
Multivariate statistical pattern recognition system for reactor noise analysis
Gonzalez, R.C.; Howington, L.C.; Sides, W.H. Jr.; Kryter, R.C.
1975-01-01
A multivariate statistical pattern recognition system for reactor noise analysis was developed. The basis of the system is a transformation for decoupling correlated variables and algorithms for inferring probability density functions. The system is adaptable to a variety of statistical properties of the data, and it has learning, tracking, and updating capabilities. System design emphasizes control of the false-alarm rate. The ability of the system to learn normal patterns of reactor behavior and to recognize deviations from these patterns was evaluated by experiments at the ORNL High-Flux Isotope Reactor (HFIR). Power perturbations of less than 0.1 percent of the mean value in selected frequency ranges were detected by the system. 19 references
Linear multivariate evaluation models for spatial perception of soundscape.
Deng, Zhiyong; Kang, Jian; Wang, Daiwei; Liu, Aili; Kang, Joe Zhengyu
2015-11-01
Soundscape is a sound environment that emphasizes the awareness of auditory perception and social or cultural understandings. The case of spatial perception is significant to soundscape. However, previous studies on the auditory spatial perception of the soundscape environment have been limited. Based on 21 native binaural-recorded soundscape samples and a set of auditory experiments for subjective spatial perception (SSP), a study of the analysis among semantic parameters, the inter-aural-cross-correlation coefficient (IACC), A-weighted-equal sound-pressure-level (L(eq)), dynamic (D), and SSP is introduced to verify the independent effect of each parameter and to re-determine some of their possible relationships. The results show that the more noisiness the audience perceived, the worse spatial awareness they received, while the closer and more directional the sound source image variations, dynamics, and numbers of sound sources in the soundscape are, the better the spatial awareness would be. Thus, the sensations of roughness, sound intensity, transient dynamic, and the values of Leq and IACC have a suitable range for better spatial perception. A better spatial awareness seems to promote the preference slightly for the audience. Finally, setting SSPs as functions of the semantic parameters and Leq-D-IACC, two linear multivariate evaluation models of subjective spatial perception are proposed.
Statistical monitoring of linear antenna arrays
Harrou, Fouzi
2016-11-03
The paper concerns the problem of monitoring linear antenna arrays using the generalized likelihood ratio (GLR) test. When an abnormal event (fault) affects an array of antenna elements, the radiation pattern changes and significant deviation from the desired design performance specifications can resulted. In this paper, the detection of faults is addressed from a statistical point of view as a fault detection problem. Specifically, a statistical method rested on the GLR principle is used to detect potential faults in linear arrays. To assess the strength of the GLR-based monitoring scheme, three case studies involving different types of faults were performed. Simulation results clearly shown the effectiveness of the GLR-based fault-detection method to monitor the performance of linear antenna arrays.
Multivariate statistical analysis of atom probe tomography data
Parish, Chad M.; Miller, Michael K.
2010-01-01
The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed.
Introducing linear functions: an alternative statistical approach
Nolan, Caroline; Herbert, Sandra
2015-12-01
The introduction of linear functions is the turning point where many students decide if mathematics is useful or not. This means the role of parameters and variables in linear functions could be considered to be `threshold concepts'. There is recognition that linear functions can be taught in context through the exploration of linear modelling examples, but this has its limitations. Currently, statistical data is easily attainable, and graphics or computer algebra system (CAS) calculators are common in many classrooms. The use of this technology provides ease of access to different representations of linear functions as well as the ability to fit a least-squares line for real-life data. This means these calculators could support a possible alternative approach to the introduction of linear functions. This study compares the results of an end-of-topic test for two classes of Australian middle secondary students at a regional school to determine if such an alternative approach is feasible. In this study, test questions were grouped by concept and subjected to concept by concept analysis of the means of test results of the two classes. This analysis revealed that the students following the alternative approach demonstrated greater competence with non-standard questions.
Adjustment of geochemical background by robust multivariate statistics
Zhou, D.
1985-01-01
Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.
Multivariate statistical tools for the radiometric features of volcanic islands
Basile, S.; Brai, M.; Marrale, M.; Micciche, S.; Lanzo, G.; Rizzo, S.
2009-01-01
The Aeolian Islands represents a Quaternary volcanic arc related to the subduction of the Ionian plate beneath the Calabrian Arc. The geochemical variability of the islands has led to a broad spectrum of magma rocks. Volcanic products from calc-alkaline (CA) to calc-alkaline high in potassium (HKCA) are present throughout the Archipelago, but products belonging to shoshonitic (SHO) and potassium (KS) series characterize the southern portion of Lipari, Vulcano and Stromboli. Tectonics also plays an important role in the process of the islands differentiation. In this work, we want to review and cross-analyze the data on Lipari, Stromboli and Vulcano, collected in measurement and sampling campaigns over the last years. Chemical data were obtained by X-ray fluorescence. High resolution gamma-ray spectrometry with germanium detectors was used to measure primordial radionuclide activities. The activity of primordial radionuclides in the volcanic products of these three islands is strongly dependent on their chemism. The highest contents are found in more differentiated products (rhyolites). The CA products have lower concentrations, while the HKCA and Shoshonitic product concentrations are in between. Calculated dose rates have been correlated with the petrochemical features in order to gain further insight in evolution and differentiation of volcanic products. Ratio matching technique and multivariate statistical analyses, such as Principal Component Analysis and Minimum Spanning Tree, have been applied as an additional tool helpful to better describe the lithological affinities of the samples. (Author)
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Multivariate statistical analysis - an application to lunar materials
Deb, M.
1978-01-01
The compositional characteristics of clinopyroxenes and spinels - two minerals considered to be very useful in deciphering lunar history, have been studied using the multivariate statistical method of principal component analysis. The mineral-chemical data used are from certain lunar rocks and fines collected by Apollo 11, 12, 14 and 15 and Luna 16 and 20 missions, representing mainly the mare basalts and also non-mare basalts, breccia and rock fragments from the highland regions, in which a large number of these minerals have been analyzed. The correlations noted in the mineral compositions, indicating substitutional relationships, have been interpreted on the basis of available crystal-chemical and petrological informations. Compositional trends for individual specimens have been delineated and compared by producing ''principal latent vector diagrams''. The percent variance of the principal components denoted by the eigenvalues, have been evaluated in terms of the crystallization history of the samples. Some of the major petrogenetic implications of this study concern the role of early formed cumulate phases in the near-surface fractionation of mare basalts, mixing of mineral compositions in the highland regolith and the subsolidus reduction trends in lunar spinels. (auth.)
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
The Inappropriate Symmetries of Multivariate Statistical Analysis in Geometric Morphometrics.
Bookstein, Fred L
In today's geometric morphometrics the commonest multivariate statistical procedures, such as principal component analysis or regressions of Procrustes shape coordinates on Centroid Size, embody a tacit roster of symmetries -axioms concerning the homogeneity of the multiple spatial domains or descriptor vectors involved-that do not correspond to actual biological fact. These techniques are hence inappropriate for any application regarding which we have a-priori biological knowledge to the contrary (e.g., genetic/morphogenetic processes common to multiple landmarks, the range of normal in anatomy atlases, the consequences of growth or function for form). But nearly every morphometric investigation is motivated by prior insights of this sort. We therefore need new tools that explicitly incorporate these elements of knowledge, should they be quantitative, to break the symmetries of the classic morphometric approaches. Some of these are already available in our literature but deserve to be known more widely: deflated (spatially adaptive) reference distributions of Procrustes coordinates, Sewall Wright's century-old variant of factor analysis, the geometric algebra of importing explicit biomechanical formulas into Procrustes space. Other methods, not yet fully formulated, might involve parameterized models for strain in idealized forms under load, principled approaches to the separation of functional from Brownian aspects of shape variation over time, and, in general, a better understanding of how the formalism of landmarks interacts with the many other approaches to quantification of anatomy. To more powerfully organize inferences from the high-dimensional measurements that characterize so much of today's organismal biology, tomorrow's toolkit must rely neither on principal component analysis nor on the Procrustes distance formula, but instead on sound prior biological knowledge as expressed in formulas whose coefficients are not all the same. I describe the problems
On The Structure of The Inverse of a Linear Constant Multivariable ...
On The Structure of The Inverse of a Linear Constant Multivariable System. ... It is shown that the use of this representation has certain advantages in the design of multivariable feedback systems. typical examples were considered to indicate the corresponding application. Keywords: Stability Functions, multivariable ...
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
A statistical approach for segregating cognitive task stages from multivariate fMRI BOLD time series
Charmaine eDemanuele
2015-10-01
Full Text Available Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from fMRI blood oxygenation level dependent (BOLD time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC, but not in the primary visual cortex (V1. Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel
Exploring multivariate representations of indices along linear geographic features
Bleisch, Susanne; Hollenstein, Daria
2018-05-01
A study of the walkability of a Swiss town required finding suitable representations of multivariate geographical da-ta. The goal was to represent multiple indices of walkability concurrently and visualizing the data along the street network it relates to. Different indices of pedestrian friendliness were assessed for short street sections and then mapped to an overlaid grid. Basic and composite glyphs were designed using square- or triangle-areas to display one to four index values concurrently within the grid structure. Color was used to indicate different indices. Implement-ing visualizations for different combinations of index sets, we find that single values can be emphasized or de-emphasized by selecting the color scheme accordingly and that different color selections either allow perceiving sin-gle values or overall trends over the evaluated area. Values for up to four indices can be displayed in combination within the resulting geovisualizations and the underlying gridded road network references the data to its real world locations.
Arenas-Garcia, J.; Petersen, K.; Camps-Valls, G.
2013-01-01
correlation analysis (CCA), and orthonormalized PLS (OPLS), as well as their nonlinear extensions derived by means of the theory of reproducing kernel Hilbert spaces (RKHSs). We also review their connections to other methods for classification and statistical dependence estimation and introduce some recent...
Linear models for multivariate, time series, and spatial data
Christensen, Ronald
1991-01-01
This is a companion volume to Plane Answers to Complex Questions: The Theory 0/ Linear Models. It consists of six additional chapters written in the same spirit as the last six chapters of the earlier book. Brief introductions are given to topics related to linear model theory. No attempt is made to give a comprehensive treatment of the topics. Such an effort would be futile. Each chapter is on a topic so broad that an in depth discussion would require a book-Iength treatment. People need to impose structure on the world in order to understand it. There is a limit to the number of unrelated facts that anyone can remem ber. If ideas can be put within a broad, sophisticatedly simple structure, not only are they easier to remember but often new insights become avail able. In fact, sophisticatedly simple models of the world may be the only ones that work. I have often heard Arnold Zellner say that, to the best of his knowledge, this is true in econometrics. The process of modeling is fundamental to understand...
Weathers, J.B.; Luck, R.; Weathers, J.W.
2009-01-01
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com
2009-11-15
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
Shangli Zhang
2009-01-01
Full Text Available By using the methods of linear algebra and matrix inequality theory, we obtain the characterization of admissible estimators in the general multivariate linear model with respect to inequality restricted parameter set. In the classes of homogeneous and general linear estimators, the necessary and suffcient conditions that the estimators of regression coeffcient function are admissible are established.
Linear and kernel methods for multivariate change detection
Canty, Morton J.; Nielsen, Allan Aasbjerg
2012-01-01
), as well as maximum autocorrelation factor (MAF) and minimum noise fraction (MNF) analyses of IR-MAD images, both linear and kernel-based (nonlinear), may further enhance change signals relative to no-change background. IDL (Interactive Data Language) implementations of IR-MAD, automatic radiometric...... normalization, and kernel PCA/MAF/MNF transformations are presented that function as transparent and fully integrated extensions of the ENVI remote sensing image analysis environment. The train/test approach to kernel PCA is evaluated against a Hebbian learning procedure. Matlab code is also available...... that allows fast data exploration and experimentation with smaller datasets. New, multiresolution versions of IR-MAD that accelerate convergence and that further reduce no-change background noise are introduced. Computationally expensive matrix diagonalization and kernel image projections are programmed...
Attitudes toward Advanced and Multivariate Statistics When Using Computers.
Kennedy, Robert L.; McCallister, Corliss Jean
This study investigated the attitudes toward statistics of graduate students who studied advanced statistics in a course in which the focus of instruction was the use of a computer program in class. The use of the program made it possible to provide an individualized, self-paced, student-centered, and activity-based course. The three sections…
Introducing Linear Functions: An Alternative Statistical Approach
Nolan, Caroline; Herbert, Sandra
2015-01-01
The introduction of linear functions is the turning point where many students decide if mathematics is useful or not. This means the role of parameters and variables in linear functions could be considered to be "threshold concepts". There is recognition that linear functions can be taught in context through the exploration of linear…
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
ALONSO ABAD, Ariel; Rodriguez, O.; TIBALDI, Fabian; CORTINAS ABRAHANTES, Jose
2002-01-01
In medical studies the categorical endpoints are quite often. Even though nowadays some models for handling this multicategorical variables have been developed their use is not common. This work shows an application of the Multivariate Generalized Linear Models to the analysis of Clinical Trials data. After a theoretical introduction models for ordinal and nominal responses are applied and the main results are discussed. multivariate analysis; multivariate logistic regression; multicategor...
Distributed Monitoring of the R2 Statistic for Linear Regression
National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...
Chaudhuri, Probal
1992-01-01
We consider a class of $U$-statistics type estimates for multivariate location. The estimates extend some $R$-estimates to multivariate data. In particular, the class of estimates includes the multivariate median considered by Gini and Galvani (1929) and Haldane (1948) and a multivariate extension of the well-known Hodges-Lehmann (1963) estimate. We explore large sample behavior of these estimates by deriving a Bahadur type representation for them. In the process of developing these asymptoti...
Multivariate statistical analysis of a multi-step industrial processes
Reinikainen, S.P.; Høskuldsson, Agnar
2007-01-01
Monitoring and quality control of industrial processes often produce information on how the data have been obtained. In batch processes, for instance, the process is carried out in stages; some process or control parameters are set at each stage. However, the obtained data might not be utilized...... efficiently, even if this information may reveal significant knowledge about process dynamics or ongoing phenomena. When studying the process data, it may be important to analyse the data in the light of the physical or time-wise development of each process step. In this paper, a unified approach to analyse...... multivariate multi-step processes, where results from each step are used to evaluate future results, is presented. The methods presented are based on Priority PLS Regression. The basic idea is to compute the weights in the regression analysis for given steps, but adjust all data by the resulting score vectors...
HORIZONTAL BRANCH MORPHOLOGY OF GLOBULAR CLUSTERS: A MULTIVARIATE STATISTICAL ANALYSIS
Jogesh Babu, G.; Chattopadhyay, Tanuka; Chattopadhyay, Asis Kumar; Mondal, Saptarshi
2009-01-01
The proper interpretation of horizontal branch (HB) morphology is crucial to the understanding of the formation history of stellar populations. In the present study a multivariate analysis is used (principal component analysis) for the selection of appropriate HB morphology parameter, which, in our case, is the logarithm of effective temperature extent of the HB (log T effHB ). Then this parameter is expressed in terms of the most significant observed independent parameters of Galactic globular clusters (GGCs) separately for coherent groups, obtained in a previous work, through a stepwise multiple regression technique. It is found that, metallicity ([Fe/H]), central surface brightness (μ v ), and core radius (r c ) are the significant parameters to explain most of the variations in HB morphology (multiple R 2 ∼ 0.86) for GGC elonging to the bulge/disk while metallicity ([Fe/H]) and absolute magnitude (M v ) are responsible for GGC belonging to the inner halo (multiple R 2 ∼ 0.52). The robustness is tested by taking 1000 bootstrap samples. A cluster analysis is performed for the red giant branch (RGB) stars of the GGC belonging to Galactic inner halo (Cluster 2). A multi-episodic star formation is preferred for RGB stars of GGC belonging to this group. It supports the asymptotic giant branch (AGB) model in three episodes instead of two as suggested by Carretta et al. for halo GGC while AGB model is suggested to be revisited for bulge/disk GGC.
Statistical monitoring of linear antenna arrays
Harrou, Fouzi; Sun, Ying
2016-01-01
The paper concerns the problem of monitoring linear antenna arrays using the generalized likelihood ratio (GLR) test. When an abnormal event (fault) affects an array of antenna elements, the radiation pattern changes and significant deviation from
On the interpretation of weight vectors of linear models in multivariate neuroimaging.
Haufe, Stefan; Meinecke, Frank; Görgen, Kai; Dähne, Sven; Haynes, John-Dylan; Blankertz, Benjamin; Bießmann, Felix
2014-02-15
The increase in spatiotemporal resolution of neuroimaging devices is accompanied by a trend towards more powerful multivariate analysis methods. Often it is desired to interpret the outcome of these methods with respect to the cognitive processes under study. Here we discuss which methods allow for such interpretations, and provide guidelines for choosing an appropriate analysis for a given experimental goal: For a surgeon who needs to decide where to remove brain tissue it is most important to determine the origin of cognitive functions and associated neural processes. In contrast, when communicating with paralyzed or comatose patients via brain-computer interfaces, it is most important to accurately extract the neural processes specific to a certain mental state. These equally important but complementary objectives require different analysis methods. Determining the origin of neural processes in time or space from the parameters of a data-driven model requires what we call a forward model of the data; such a model explains how the measured data was generated from the neural sources. Examples are general linear models (GLMs). Methods for the extraction of neural information from data can be considered as backward models, as they attempt to reverse the data generating process. Examples are multivariate classifiers. Here we demonstrate that the parameters of forward models are neurophysiologically interpretable in the sense that significant nonzero weights are only observed at channels the activity of which is related to the brain process under study. In contrast, the interpretation of backward model parameters can lead to wrong conclusions regarding the spatial or temporal origin of the neural signals of interest, since significant nonzero weights may also be observed at channels the activity of which is statistically independent of the brain process under study. As a remedy for the linear case, we propose a procedure for transforming backward models into forward
Multivariate Statistical Process Optimization in the Industrial Production of Enzymes
Klimkiewicz, Anna
of productyield. The potential of NIR technology to monitor the activity of the enzyme has beenthe subject of a feasibility study presented in PAPER I. It included (a) evaluation onwhich of the two real-time NIR flow cell configurations is the preferred arrangementfor monitoring of the retentate stream downstream...... strategies for theorganization of these datasets, with varying number of timestamps, into datastructures fit for latent variable (LV) modeling, have been compared. The ultimateaim of the data mining steps is the construction of statistical ‘soft models’ whichcapture the principle or latent behavior...
Statistical Tests for Mixed Linear Models
Khuri, André I; Sinha, Bimal K
2011-01-01
An advanced discussion of linear models with mixed or random effects. In recent years a breakthrough has occurred in our ability to draw inferences from exact and optimum tests of variance component models, generating much research activity that relies on linear models with mixed and random effects. This volume covers the most important research of the past decade as well as the latest developments in hypothesis testing. It compiles all currently available results in the area of exact and optimum tests for variance component models and offers the only comprehensive treatment for these models a
Answers to selected problems in multivariable calculus with linear algebra and series
Trench, William F
1972-01-01
Answers to Selected Problems in Multivariable Calculus with Linear Algebra and Series contains the answers to selected problems in linear algebra, the calculus of several variables, and series. Topics covered range from vectors and vector spaces to linear matrices and analytic geometry, as well as differential calculus of real-valued functions. Theorems and definitions are included, most of which are followed by worked-out illustrative examples.The problems and corresponding solutions deal with linear equations and matrices, including determinants; vector spaces and linear transformations; eig
Refining developmental coordination disorder subtyping with multivariate statistical methods
Lalanne Christophe
2012-07-01
Full Text Available Abstract Background With a large number of potentially relevant clinical indicators penalization and ensemble learning methods are thought to provide better predictive performance than usual linear predictors. However, little is known about how they perform in clinical studies where few cases are available. We used Random Forests and Partial Least Squares Discriminant Analysis to select the most salient impairments in Developmental Coordination Disorder (DCD and assess patients similarity. Methods We considered a wide-range testing battery for various neuropsychological and visuo-motor impairments which aimed at characterizing subtypes of DCD in a sample of 63 children. Classifiers were optimized on a training sample, and they were used subsequently to rank the 49 items according to a permuted measure of variable importance. In addition, subtyping consistency was assessed with cluster analysis on the training sample. Clustering fitness and predictive accuracy were evaluated on the validation sample. Results Both classifiers yielded a relevant subset of items impairments that altogether accounted for a sharp discrimination between three DCD subtypes: ideomotor, visual-spatial and constructional, and mixt dyspraxia. The main impairments that were found to characterize the three subtypes were: digital perception, imitations of gestures, digital praxia, lego blocks, visual spatial structuration, visual motor integration, coordination between upper and lower limbs. Classification accuracy was above 90% for all classifiers, and clustering fitness was found to be satisfactory. Conclusions Random Forests and Partial Least Squares Discriminant Analysis are useful tools to extract salient features from a large pool of correlated binary predictors, but also provide a way to assess individuals proximities in a reduced factor space. Less than 15 neuro-visual, neuro-psychomotor and neuro-psychological tests might be required to provide a sensitive and
Sunando Roy
2009-10-01
Full Text Available Feline immunodeficiency virus (FIV and human immunodeficiency virus (HIV are recently identified lentiviruses that cause progressive immune decline and ultimately death in infected cats and humans. It is of great interest to understand how to prevent immune system collapse caused by these lentiviruses. We recently described that disease caused by a virulent FIV strain in cats can be attenuated if animals are first infected with a feline immunodeficiency virus derived from a wild cougar. The detailed temporal tracking of cat immunological parameters in response to two viral infections resulted in high-dimensional datasets containing variables that exhibit strong co-variation. Initial analyses of these complex data using univariate statistical techniques did not account for interactions among immunological response variables and therefore potentially obscured significant effects between infection state and immunological parameters.Here, we apply a suite of multivariate statistical tools, including Principal Component Analysis, MANOVA and Linear Discriminant Analysis, to temporal immunological data resulting from FIV superinfection in domestic cats. We investigated the co-variation among immunological responses, the differences in immune parameters among four groups of five cats each (uninfected, single and dual infected animals, and the "immune profiles" that discriminate among them over the first four weeks following superinfection. Dual infected cats mount an immune response by 24 days post superinfection that is characterized by elevated levels of CD8 and CD25 cells and increased expression of IL4 and IFNgamma, and FAS. This profile discriminates dual infected cats from cats infected with FIV alone, which show high IL-10 and lower numbers of CD8 and CD25 cells.Multivariate statistical analyses demonstrate both the dynamic nature of the immune response to FIV single and dual infection and the development of a unique immunological profile in dual
Linear Mixed Models in Statistical Genetics
R. de Vlaming (Ronald)
2017-01-01
markdownabstractOne of the goals of statistical genetics is to elucidate the genetic architecture of phenotypes (i.e., observable individual characteristics) that are affected by many genetic variants (e.g., single-nucleotide polymorphisms; SNPs). A particular aim is to identify specific SNPs that
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.
2012-01-01
PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator
A unifying framework for k-statistics, polykays and their multivariate generalizations.
DI NARDO, Elvira; GUARINO G, G.; Senato, D.
2008-01-01
Through the classical umbral calculus, we provide a unifying syntax for single and multivariate $k$-statistics, polykays and multivariate polykays. From a combinatorial point of view, we revisit the theory as exposed by Stuart and Ord, taking into account the Doubilet approach to symmetric functions. Moreover, by using exponential polynomials rather than set partitions, we provide a new formula for $k$-statistics that results in a very fast algorithm to generate such estimators.
Kruger, Uwe
2012-01-01
The development and application of multivariate statistical techniques in process monitoring has gained substantial interest over the past two decades in academia and industry alike. Initially developed for monitoring and fault diagnosis in complex systems, such techniques have been refined and applied in various engineering areas, for example mechanical and manufacturing, chemical, electrical and electronic, and power engineering. The recipe for the tremendous interest in multivariate statistical techniques lies in its simplicity and adaptability for developing monitoring applica
Introduction to statistical modelling: linear regression.
Lunt, Mark
2015-07-01
In many studies we wish to assess how a range of variables are associated with a particular outcome and also determine the strength of such relationships so that we can begin to understand how these factors relate to each other at a population level. Ultimately, we may also be interested in predicting the outcome from a series of predictive factors available at, say, a routine clinic visit. In a recent article in Rheumatology, Desai et al. did precisely that when they studied the prediction of hip and spine BMD from hand BMD and various demographic, lifestyle, disease and therapy variables in patients with RA. This article aims to introduce the statistical methodology that can be used in such a situation and explain the meaning of some of the terms employed. It will also outline some common pitfalls encountered when performing such analyses. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
An R2 statistic for fixed effects in the linear mixed model.
Edwards, Lloyd J; Muller, Keith E; Wolfinger, Russell D; Qaqish, Bahjat F; Schabenberger, Oliver
2008-12-20
Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R(2) statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R(2) statistic for the linear mixed model by using only a single model. The proposed R(2) statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R(2) statistic arises as a 1-1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R(2) statistic leads immediately to a natural definition of a partial R(2) statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R(2) , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.
Multivariate Statistical Methods as a Tool of Financial Analysis of Farm Business
Novák, J.; Sůvová, H.; Vondráček, Jiří
2002-01-01
Roč. 48, č. 1 (2002), s. 9-12 ISSN 0139-570X Institutional research plan: AV0Z1030915 Keywords : financial analysis * financial ratios * multivariate statistical methods * correlation analysis * discriminant analysis * cluster analysis Subject RIV: BB - Applied Statistics, Operational Research
2017-09-01
application of statistical inference. Even when human forecasters leverage their professional experience, which is often gained through long periods of... application throughout statistics and Bayesian data analysis. The multivariate form of 2( , ) (e.g., Figure 12) is similarly analytically...data (i.e., no systematic manipulations with analytical functions), it is common in the statistical literature to apply mathematical transformations
Multivariate statistical study of heavy metal enrichment in sediments of the Pearl River Estuary
Liu, W.X.; Li, X.D.; Shen, Z.G.; Wang, D.C.; Wai, O.W.H.; Li, Y.S.
2003-01-01
Multivariate statistical analysis identified the heavy metal accumulation layers of sediment profiles and showed the various sources of metals in the estuary. - The concentrations and chemical partitioning of heavy metals in the sediment cores of the Pearl River Estuary were studied. Based on Pearson correlation coefficients and principal component analysis results, Al was selected as the concentration normalizer for Pb, while Fe was used as the normalizing element for Co, Cu, Ni and Zn. In each profile, sections with metal concentrations exceeding the upper 95% prediction interval of the linear regression model were regarded as metal enrichment layers. The heavy metal accumulation mainly occurred at sites in the western shallow water areas and east channel, which reflected the hydraulic conditions and influence from riparian anthropogenic activities. Heavy metals in the enrichment sections were evaluated by a sequential extraction method for possible chemical forms in sediments. Since the residual, Fe/Mn oxides and organic/sulfide fractions were dominant geochemical phases in the enriched sections, the bioavailability of heavy metals in sediments was generally low. The 206 Pb/ 207 Pb ratios in the metal-enriched sediment sections also revealed the influence of anthropogenic sources. The spatial distribution of cumulative heavy metals in the sediments suggested that the Zn and Cu mainly originated from point sources, while the Pb probably came from non-point sources in the estuary
Tian, Dayong; Lü, Guodong; Zhai, Zhengang; Du, Guoli; Mo, Jiaqing; Lü, Xiaoyi
2018-01-01
In this paper, serum surface-enhanced Raman scattering and multivariate statistical analysis are used to investigate a rapid screening technique for thyroid function diseases. At present, the detection of thyroid function has become increasingly important, and it is urgently necessary to develop a rapid and portable method for the detection of thyroid function. Our experimental results show that, by using the Silmeco-based enhanced Raman signal, the signal strength greatly increases and the characteristic peak appears obviously. It is also observed that the Raman spectra of normal and anomalous thyroid function human serum are significantly different. Principal component analysis (PCA) combined with linear discriminant analysis (LDA) was used to diagnose thyroid dysfunction, and the diagnostic accuracy was 87.4%. The use of serum surface-enhanced Raman scattering technology combined with PCA-LDA shows good diagnostic performance for the rapid detection of thyroid function. By means of Raman technology, it is expected that a portable device for the rapid detection of thyroid function will be developed.
Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi
2017-01-01
High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Lepore, Natasha; Brun, Caroline; Chou, Yi-Yu; Chiang, Ming-Chang; Dutton, Rebecca A.; Hayashi, Kiralee M.; Luders, Eileen; Lopez, Oscar L.; Aizenstein, Howard J.; Toga, Arthur W.; Becker, James T.; Thompson, Paul M.
2008-01-01
This paper investigates the performance of a new multivariate method for tensor-based morphometry (TBM). Statistics on Riemannian manifolds are developed that exploit the full information in deformation tensor fields. In TBM, multiple brain images are warped to a common neuroanatomical template via 3-D nonlinear registration; the resulting deformation fields are analyzed statistically to identify group differences in anatomy. Rather than study the Jacobian determinant (volume expansion factor...
Multivariate statistical treatment of PIXE analysis of some traditional Chinese medicines
Xiaofeng Zhang; Jianguo Ma; Junfa Qin; Lun Xiao
1991-01-01
Elements in two kinds of 30 traditional Chinese medicines were analyzed by PIXE method, and the data were treated by multivariate statistical methods. The results show that these two kinds of traditional Chinese medicines are almost separable according to their elemental contents. The results are congruous with the traditional Chinese medicine practice. (author) 7 refs.; 2 figs.; 2 tabs
Chen Kouping; Jiao, Jiu J.; Huang Jianmin; Huang Runqiu
2007-01-01
Multivariate statistical techniques are efficient ways to display complex relationships among many objects. An attempt was made to study the data of trace elements in groundwater using multivariate statistical techniques such as principal component analysis (PCA), Q-mode factor analysis and cluster analysis. The original matrix consisted of 17 trace elements estimated from 55 groundwater samples colleted in 27 wells located in a coastal area in Shenzhen, China. PCA results show that trace elements of V, Cr, As, Mo, W, and U with greatest positive loadings typically occur as soluble oxyanions in oxidizing waters, while Mn and Co with greatest negative loadings are generally more soluble within oxygen depleted groundwater. Cluster analyses demonstrate that most groundwater samples collected from the same well in the study area during summer and winter still fall into the same group. This study also demonstrates the usefulness of multivariate statistical analysis in hydrochemical studies. - Multivariate statistical analysis was used to investigate relationships among trace elements and factors controlling trace element distribution in groundwater
Monitoring a PVC batch process with multivariate statistical process control charts
Tates, A. A.; Louwerse, D. J.; Smilde, A. K.; Koot, G. L. M.; Berndt, H.
1999-01-01
Multivariate statistical process control charts (MSPC charts) are developed for the industrial batch production process of poly(vinyl chloride) (PVC). With these MSPC charts different types of abnormal batch behavior were detected on-line. With batch contribution plots, the probable causes of these
Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM
Warner, Rebecca M.
2007-01-01
This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
Linear mixed models a practical guide using statistical software
West, Brady T; Galecki, Andrzej T
2006-01-01
Simplifying the often confusing array of software programs for fitting linear mixed models (LMMs), Linear Mixed Models: A Practical Guide Using Statistical Software provides a basic introduction to primary concepts, notation, software implementation, model interpretation, and visualization of clustered and longitudinal data. This easy-to-navigate reference details the use of procedures for fitting LMMs in five popular statistical software packages: SAS, SPSS, Stata, R/S-plus, and HLM. The authors introduce basic theoretical concepts, present a heuristic approach to fitting LMMs based on bo
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Chen, Zhe; Qiu, Zurong; Huo, Xinming; Fan, Yuming; Li, Xinghua
2017-03-01
A fiber-capacitive drop analyzer is an instrument which monitors a growing droplet to produce a capacitive opto-tensiotrace (COT). Each COT is an integration of fiber light intensity signals and capacitance signals and can reflect the unique physicochemical property of a liquid. In this study, we propose a solution analytical and concentration quantitative method based on multivariate statistical methods. Eight characteristic values are extracted from each COT. A series of COT characteristic values of training solutions at different concentrations compose a data library of this kind of solution. A two-stage linear discriminant analysis is applied to analyze different solution libraries and establish discriminant functions. Test solutions can be discriminated by these functions. After determining the variety of test solutions, Spearman correlation test and principal components analysis are used to filter and reduce dimensions of eight characteristic values, producing a new representative parameter. A cubic spline interpolation function is built between the parameters and concentrations, based on which we can calculate the concentration of the test solution. Methanol, ethanol, n-propanol, and saline solutions are taken as experimental subjects in this paper. For each solution, nine or ten different concentrations are chosen to be the standard library, and the other two concentrations compose the test group. By using the methods mentioned above, all eight test solutions are correctly identified and the average relative error of quantitative analysis is 1.11%. The method proposed is feasible which enlarges the applicable scope of recognizing liquids based on the COT and improves the concentration quantitative precision, as well.
Peebles, D.E.; Ohlhausen, J.A.; Kotula, P.G.; Hutton, S.; Blomfield, C.
2004-01-01
The acquisition of spectral images for x-ray photoelectron spectroscopy (XPS) is a relatively new approach, although it has been used with other analytical spectroscopy tools for some time. This technique provides full spectral information at every pixel of an image, in order to provide a complete chemical mapping of the imaged surface area. Multivariate statistical analysis techniques applied to the spectral image data allow the determination of chemical component species, and their distribution and concentrations, with minimal data acquisition and processing times. Some of these statistical techniques have proven to be very robust and efficient methods for deriving physically realistic chemical components without input by the user other than the spectral matrix itself. The benefits of multivariate analysis of the spectral image data include significantly improved signal to noise, improved image contrast and intensity uniformity, and improved spatial resolution - which are achieved due to the effective statistical aggregation of the large number of often noisy data points in the image. This work demonstrates the improvements in chemical component determination and contrast, signal-to-noise level, and spatial resolution that can be obtained by the application of multivariate statistical analysis to XPS spectral images
Yue, Chen; Chen, Shaojie; Sair, Haris I; Airan, Raag; Caffo, Brian S
2015-09-01
Data reproducibility is a critical issue in all scientific experiments. In this manuscript, the problem of quantifying the reproducibility of graphical measurements is considered. The image intra-class correlation coefficient (I2C2) is generalized and the graphical intra-class correlation coefficient (GICC) is proposed for such purpose. The concept for GICC is based on multivariate probit-linear mixed effect models. A Markov Chain Monte Carlo EM (mcm-cEM) algorithm is used for estimating the GICC. Simulation results with varied settings are demonstrated and our method is applied to the KIRBY21 test-retest dataset.
MacNab, Ying C
2016-08-01
This paper concerns with multivariate conditional autoregressive models defined by linear combination of independent or correlated underlying spatial processes. Known as linear models of coregionalization, the method offers a systematic and unified approach for formulating multivariate extensions to a broad range of univariate conditional autoregressive models. The resulting multivariate spatial models represent classes of coregionalized multivariate conditional autoregressive models that enable flexible modelling of multivariate spatial interactions, yielding coregionalization models with symmetric or asymmetric cross-covariances of different spatial variation and smoothness. In the context of multivariate disease mapping, for example, they facilitate borrowing strength both over space and cross variables, allowing for more flexible multivariate spatial smoothing. Specifically, we present a broadened coregionalization framework to include order-dependent, order-free, and order-robust multivariate models; a new class of order-free coregionalized multivariate conditional autoregressives is introduced. We tackle computational challenges and present solutions that are integral for Bayesian analysis of these models. We also discuss two ways of computing deviance information criterion for comparison among competing hierarchical models with or without unidentifiable prior parameters. The models and related methodology are developed in the broad context of modelling multivariate data on spatial lattice and illustrated in the context of multivariate disease mapping. The coregionalization framework and related methods also present a general approach for building spatially structured cross-covariance functions for multivariate geostatistics. © The Author(s) 2016.
Application of Multivariable Statistical Techniques in Plant-wide WWTP Control Strategies Analysis
Flores Alsina, Xavier; Comas, J.; Rodríguez-Roda, I.
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant...... analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii......) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation...
Bakraji, Elias Hanna; Abboud, Rana; Issa, Haissm
2014-01-01
Thermoluminescence (TL) dating and multivariate statistical methods based on radioisotope X-ray fluorescence analysis have been utilized to date and classify Syrian archaeological ceramics fragment from Tel Jamous site. 54 samples were analyzed by radioisotope X-ray fluorescence; 51 of them come from Tel Jamous archaeological site in Sahel Akkar region, Syria, which fairly represent ceramics belonging to the Middle Bronze Age (2150 to 1600 B.C.) and the remaining three samples come from Mar-T...
Luiz Augusto da Cruz Meleiro
2005-06-01
Full Text Available In this work a MIMO non-linear predictive controller was developed for an extractive alcoholic fermentation process. The internal model of the controller was represented by two MISO Functional Link Networks (FLNs, identified using simulated data generated from a deterministic mathematical model whose kinetic parameters were determined experimentally. The FLN structure presents as advantages fast training and guaranteed convergence, since the estimation of the weights is a linear optimization problem. Besides, the elimination of non-significant weights generates parsimonious models, which allows for fast execution in an MPC-based algorithm. The proposed algorithm showed good potential in identification and control of non-linear processes.Neste trabalho um controlador preditivo não linear multivariável foi desenvolvido para um processo de fermentação alcoólica extrativa. O modelo interno do controlador foi representado por duas redes do tipo Functional Link (FLN, identificadas usando dados de simulação gerados a partir de um modelo validado experimentalmente. A estrutura FLN apresenta como vantagem o treinamento rápido e convergência garantida, já que a estimação dos seus pesos é um problema de otimização linear. Além disso, a eliminação de pesos não significativos gera modelos parsimoniosos, o que permite a rápida execução em algoritmos de controle preditivo baseado em modelo. Os resultados mostram que o algoritmo proposto tem grande potencial para identificação e controle de processos não lineares.
Kia, Seyed Mostafa; Vega Pons, Sandro; Weisz, Nathan; Passerini, Andrea
2016-01-01
Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. Linear classifiers are widely employed in the brain decoding paradigm to discriminate among experimental conditions. Then, the derived linear weights are visualized in the form of multivariate brain maps to further study spatio-temporal patterns of underlying neural activities. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed definition, we exemplify a heuristic for approximating the interpretability in multivariate analysis of evoked magnetoencephalography (MEG) responses. Third, we propose to combine the approximated interpretability and the generalization performance of the brain decoding into a new multi-objective criterion for model selection. Our results, for the simulated and real MEG data, show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms
Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.
Wang, Yifan; Liu, Aiyi; Mills, James L; Boehnke, Michael; Wilson, Alexander F; Bailey-Wilson, Joan E; Xiong, Momiao; Wu, Colin O; Fan, Ruzong
2015-05-01
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. © 2015 WILEY PERIODICALS, INC.
Tapper, U.A.S.; Malmqvist, K.G.; Loevestam, N.E.G.; Swietlicki, E.; Salford, L.G.
1991-01-01
The importance of statistical evaluation of multielemental data is illustrated using the data collected in a macro- and micro-PIXE analysis of human brain tumours. By employing a multivariate statistical classification methodology (SIMCA) it was shown that the total information collected from each specimen separates three types of tissue: High malignant, less malignant and normal brain tissue. This makes a classification of a given specimen possible based on the elemental concentrations. Partial least squares regression (PLS), a multivariate regression method, made it possible to study the relative importance of the examined nine trace elements, the dry/wet weight ratio and the age of the patient in predicting the survival time after operation for patients with the high malignant form, astrocytomas grade III-IV. The elemental maps from a microprobe analysis were also subjected to multivariate analysis. This showed that the six elements sorted into maps could be presented in three maps containing all the relevant information. The intensity in these maps is proportional to the value (score) of the actual pixel along the calculated principal components. (orig.)
Lepore, N; Brun, C; Chou, Y Y; Chiang, M C; Dutton, R A; Hayashi, K M; Luders, E; Lopez, O L; Aizenstein, H J; Toga, A W; Becker, J T; Thompson, P M
2008-01-01
This paper investigates the performance of a new multivariate method for tensor-based morphometry (TBM). Statistics on Riemannian manifolds are developed that exploit the full information in deformation tensor fields. In TBM, multiple brain images are warped to a common neuroanatomical template via 3-D nonlinear registration; the resulting deformation fields are analyzed statistically to identify group differences in anatomy. Rather than study the Jacobian determinant (volume expansion factor) of these deformations, as is common, we retain the full deformation tensors and apply a manifold version of Hotelling's $T(2) test to them, in a Log-Euclidean domain. In 2-D and 3-D magnetic resonance imaging (MRI) data from 26 HIV/AIDS patients and 14 matched healthy subjects, we compared multivariate tensor analysis versus univariate tests of simpler tensor-derived indices: the Jacobian determinant, the trace, geodesic anisotropy, and eigenvalues of the deformation tensor, and the angle of rotation of its eigenvectors. We detected consistent, but more extensive patterns of structural abnormalities, with multivariate tests on the full tensor manifold. Their improved power was established by analyzing cumulative p-value plots using false discovery rate (FDR) methods, appropriately controlling for false positives. This increased detection sensitivity may empower drug trials and large-scale studies of disease that use tensor-based morphometry.
Raza, K.S.M.
2004-01-01
This paper demonstrates that if a complicated nonlinear, non-square, state-coupled multi variable system is smartly linearized and subjected to a thorough stability analysis then we can achieve our design objectives via a controller which will be quite simple (in term of resource usage and execution time) and very efficient (in terms of robustness). Further the aim is to implement this controller via computer in a real time environment. Therefore first a nonlinear mathematical model of the system is achieved. An intelligent work is done to decouple the multivariable system. Linearization and stability analysis techniques are employed for the development of a linearized and mathematically sound control law. Nonlinearities like the saturation in actuators are also been catered. The controller is then discretized using Runge-Kutta integration. Finally the discretized control law is programmed in a computer in a real time environment. The programme is done in RT -Linux using GNU C for the real time realization of the control scheme. The real time processes, like sampling and controlled actuation, and the non real time processes, like graphical user interface and display, are programmed as different tasks. The issue of inter process communication, between real time and non real time task is addressed quite carefully. The results of this research pursuit are presented graphically. (author)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Aminu Ibrahim; Hafizan Juahir; Mohd Ekhwan Toriman; Mustapha, A.; Azman Azid; Isiyaka, H.A.
2015-01-01
Multivariate Statistical techniques including cluster analysis, discriminant analysis, and principal component analysis/factor analysis were applied to investigate the spatial variation and pollution sources in the Terengganu river basin during 5 years of monitoring 13 water quality parameters at thirteen different stations. Cluster analysis (CA) classified 13 stations into 2 clusters low polluted (LP) and moderate polluted (MP) based on similar water quality characteristics. Discriminant analysis (DA) rendered significant data reduction with 4 parameters (pH, NH 3 -NL, PO 4 and EC) and correct assignation of 95.80 %. The PCA/ FA applied to the data sets, yielded in five latent factors accounting 72.42 % of the total variance in the water quality data. The obtained varifactors indicate that parameters in charge for water quality variations are mainly related to domestic waste, industrial, runoff and agricultural (anthropogenic activities). Therefore, multivariate techniques are important in environmental management. (author)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Multivariate linear regression of high-dimensional fMRI data with multiple target variables.
Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia
2014-05-01
Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.
Study on loss detection algorithms for tank monitoring data using multivariate statistical analysis
Suzuki, Mitsutoshi; Burr, Tom
2009-01-01
Evaluation of solution monitoring data to support material balance evaluation was proposed about a decade ago because of concerns regarding the large throughput planned at Rokkasho Reprocessing Plant (RRP). A numerical study using the simulation code (FACSIM) was done and significant increases in the detection probabilities (DP) for certain types of losses were shown. To be accepted internationally, it is very important to verify such claims using real solution monitoring data. However, a demonstrative study with real tank data has not been carried out due to the confidentiality of the tank data. This paper describes an experimental study that has been started using actual data from the Solution Measurement and Monitoring System (SMMS) in the Tokai Reprocessing Plant (TRP) and the Savannah River Site (SRS). Multivariate statistical methods, such as a vector cumulative sum and a multi-scale statistical analysis, have been applied to the real tank data that have superimposed simulated loss. Although quantitative conclusions have not been derived for the moment due to the difficulty of baseline evaluation, the multivariate statistical methods remain promising for abrupt and some types of protracted loss detection. (author)
An Improvement of the Hotelling T2 Statistic in Monitoring Multivariate Quality Characteristics
Ashkan Shabbak
2012-01-01
Full Text Available The Hotelling T2 statistic is the most popular statistic used in multivariate control charts to monitor multiple qualities. However, this statistic is easily affected by the existence of more than one outlier in the data set. To rectify this problem, robust control charts, which are based on the minimum volume ellipsoid and the minimum covariance determinant, have been proposed. Most researchers assess the performance of multivariate control charts based on the number of signals without paying much attention to whether those signals are really outliers. With due respect, we propose to evaluate control charts not only based on the number of detected outliers but also with respect to their correct positions. In this paper, an Upper Control Limit based on the median and the median absolute deviation is also proposed. The results of this study signify that the proposed Upper Control Limit improves the detection of correct outliers but that it suffers from a swamping effect when the positions of outliers are not taken into consideration. Finally, a robust control chart based on the diagnostic robust generalised potential procedure is introduced to remedy this drawback.
Bakraji, E. H.; Othman, I.; Sarhil, A.; Al-Somel, N.
2002-01-01
Instrumental neutron activation analysis (INAA) has been utilized in the analysis of thirty-seven archaeological ceramics fragment samples collected from Tal AI-Wardiate site, Missiaf town, Hamma city, Syria. 36 chemical elements were determined. These elemental concentrations have been processed using two multivariate statistical methods, cluster and factor analysis in order to determine similarities and correlation between the various samples. Factor analysis confirms that samples were correctly classified by cluster analysis. The results showed that samples can be considered to be manufactured using three different sources of raw material. (author)
Wang, Ming; Li, Zheng; Lee, Eun Young; Lewis, Mechelle M; Zhang, Lijun; Sterling, Nicholas W; Wagner, Daymond; Eslinger, Paul; Du, Guangwei; Huang, Xuemei
2017-09-25
It is challenging for current statistical models to predict clinical progression of Parkinson's disease (PD) because of the involvement of multi-domains and longitudinal data. Past univariate longitudinal or multivariate analyses from cross-sectional trials have limited power to predict individual outcomes or a single moment. The multivariate generalized linear mixed-effect model (GLMM) under the Bayesian framework was proposed to study multi-domain longitudinal outcomes obtained at baseline, 18-, and 36-month. The outcomes included motor, non-motor, and postural instability scores from the MDS-UPDRS, and demographic and standardized clinical data were utilized as covariates. The dynamic prediction was performed for both internal and external subjects using the samples from the posterior distributions of the parameter estimates and random effects, and also the predictive accuracy was evaluated based on the root of mean square error (RMSE), absolute bias (AB) and the area under the receiver operating characteristic (ROC) curve. First, our prediction model identified clinical data that were differentially associated with motor, non-motor, and postural stability scores. Second, the predictive accuracy of our model for the training data was assessed, and improved prediction was gained in particularly for non-motor (RMSE and AB: 2.89 and 2.20) compared to univariate analysis (RMSE and AB: 3.04 and 2.35). Third, the individual-level predictions of longitudinal trajectories for the testing data were performed, with ~80% observed values falling within the 95% credible intervals. Multivariate general mixed models hold promise to predict clinical progression of individual outcomes in PD. The data was obtained from Dr. Xuemei Huang's NIH grant R01 NS060722 , part of NINDS PD Biomarker Program (PDBP). All data was entered within 24 h of collection to the Data Management Repository (DMR), which is publically available ( https://pdbp.ninds.nih.gov/data-management ).
The intervals method: a new approach to analyse finite element outputs using multivariate statistics
Jordi Marcé-Nogué
2017-10-01
Full Text Available Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches.
The intervals method: a new approach to analyse finite element outputs using multivariate statistics
De Esteban-Trivigno, Soledad; Püschel, Thomas A.; Fortuny, Josep
2017-01-01
Background In this paper, we propose a new method, named the intervals’ method, to analyse data from finite element models in a comparative multivariate framework. As a case study, several armadillo mandibles are analysed, showing that the proposed method is useful to distinguish and characterise biomechanical differences related to diet/ecomorphology. Methods The intervals’ method consists of generating a set of variables, each one defined by an interval of stress values. Each variable is expressed as a percentage of the area of the mandible occupied by those stress values. Afterwards these newly generated variables can be analysed using multivariate methods. Results Applying this novel method to the biological case study of whether armadillo mandibles differ according to dietary groups, we show that the intervals’ method is a powerful tool to characterize biomechanical performance and how this relates to different diets. This allows us to positively discriminate between specialist and generalist species. Discussion We show that the proposed approach is a useful methodology not affected by the characteristics of the finite element mesh. Additionally, the positive discriminating results obtained when analysing a difficult case study suggest that the proposed method could be a very useful tool for comparative studies in finite element analysis using multivariate statistical approaches. PMID:29043107
A multivariate statistical study on a diversified data gathering system for nuclear power plants
Samanta, P.K.; Teichmann, T.; Levine, M.M.; Kato, W.Y.
1989-02-01
In this report, multivariate statistical methods are presented and applied to demonstrate their use in analyzing nuclear power plant operational data. For analyses of nuclear power plant events, approaches are presented for detecting malfunctions and degradations within the course of the event. At the system level, approaches are investigated as a means of diagnosis of system level performance. This involves the detection of deviations from normal performance of the system. The input data analyzed are the measurable physical parameters, such as steam generator level, pressurizer water level, auxiliary feedwater flow, etc. The study provides the methodology and illustrative examples based on data gathered from simulation of nuclear power plant transients and computer simulation of a plant system performance (due to lack of easily accessible operational data). Such an approach, once fully developed, can be used to explore statistically the detection of failure trends and patterns and prevention of conditions with serious safety implications. 33 refs., 18 figs., 9 tabs
Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.; Sales, Brian C.; Sefat, Athena S.
2014-01-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe 0.55 Se 0.45 (T c = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe 1−x Se x structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces
Brizzi, S.; Sandri, L.; Funiciello, F.; Corbi, F.; Piromallo, C.; Heuret, A.
2018-03-01
The observed maximum magnitude of subduction megathrust earthquakes is highly variable worldwide. One key question is which conditions, if any, favor the occurrence of giant earthquakes (Mw ≥ 8.5). Here we carry out a multivariate statistical study in order to investigate the factors affecting the maximum magnitude of subduction megathrust earthquakes. We find that the trench-parallel extent of subduction zones and the thickness of trench sediments provide the largest discriminating capability between subduction zones that have experienced giant earthquakes and those having significantly lower maximum magnitude. Monte Carlo simulations show that the observed spatial distribution of giant earthquakes cannot be explained by pure chance to a statistically significant level. We suggest that the combination of a long subduction zone with thick trench sediments likely promotes a great lateral rupture propagation, characteristic of almost all giant earthquakes.
David Sandquist
2015-06-01
Full Text Available A new method is presented for quantitative evaluation of hybrid aspen genotype xylem morphology and immunolabeling micro-distribution. This method can be used as an aid in assessing differences in genotypes from classic tree breeding studies, as well as genetically engineered plants. The method is based on image analysis, multivariate statistical evaluation of light, and immunofluorescence microscopy images of wood xylem cross sections. The selected immunolabeling antibodies targeted five different epitopes present in aspen xylem cell walls. Twelve down-regulated hybrid aspen genotypes were included in the method development. The 12 knock-down genotypes were selected based on pre-screening by pyrolysis-IR of global chemical content. The multivariate statistical evaluations successfully identified comparative trends for modifications in the down-regulated genotypes compared to the unmodified control, even when no definitive conclusions could be drawn from individual studied variables alone. Of the 12 genotypes analyzed, three genotypes showed significant trends for modifications in both morphology and immunolabeling. Six genotypes showed significant trends for modifications in either morphology or immunocoverage. The remaining three genotypes did not show any significant trends for modification.
Point defect characterization in HAADF-STEM images using multivariate statistical analysis
Sarahan, Michael C.; Chi, Miaofang; Masiel, Daniel J.; Browning, Nigel D.
2011-01-01
Quantitative analysis of point defects is demonstrated through the use of multivariate statistical analysis. This analysis consists of principal component analysis for dimensional estimation and reduction, followed by independent component analysis to obtain physically meaningful, statistically independent factor images. Results from these analyses are presented in the form of factor images and scores. Factor images show characteristic intensity variations corresponding to physical structure changes, while scores relate how much those variations are present in the original data. The application of this technique is demonstrated on a set of experimental images of dislocation cores along a low-angle tilt grain boundary in strontium titanate. A relationship between chemical composition and lattice strain is highlighted in the analysis results, with picometer-scale shifts in several columns measurable from compositional changes in a separate column. -- Research Highlights: → Multivariate analysis of HAADF-STEM images. → Distinct structural variations among SrTiO 3 dislocation cores. → Picometer atomic column shifts correlated with atomic column population changes.
Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.
1980-01-01
Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Most, Sebastian; Nowak, Wolfgang; Bijeljic, Branko
2015-04-01
Fickian transport in groundwater flow is the exception rather than the rule. Transport in porous media is frequently simulated via particle methods (i.e. particle tracking random walk (PTRW) or continuous time random walk (CTRW)). These methods formulate transport as a stochastic process of particle position increments. At the pore scale, geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Hence, it is important to get a better understanding of the processes at pore scale. For our analysis we track the positions of 10.000 particles migrating through the pore space over time. The data we use come from micro CT scans of a homogeneous sandstone and encompass about 10 grain sizes. Based on those images we discretize the pore structure and simulate flow at the pore scale based on the Navier-Stokes equation. This flow field realistically describes flow inside the pore space and we do not need to add artificial dispersion during the transport simulation. Next, we use particle tracking random walk and simulate pore-scale transport. Finally, we use the obtained particle trajectories to do a multivariate statistical analysis of the particle motion at the pore scale. Our analysis is based on copulas. Every multivariate joint distribution is a combination of its univariate marginal distributions. The copula represents the dependence structure of those univariate marginals and is therefore useful to observe correlation and non-Gaussian interactions (i.e. non-Fickian transport). The first goal of this analysis is to better understand the validity regions of commonly made assumptions. We are investigating three different transport distances: 1) The distance where the statistical dependence between particle increments can be modelled as an order-one Markov process. This would be the Markovian distance for the process, where
Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida
Sayemuzzaman, M.; Ye, M.
2015-12-01
The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface
Multivariate statistical analysis of radioactive variables in two phosphate ores from Sudan
Adam, Abdel Majid A.; Eltayeb, Mohamed Ahmed H.
2012-01-01
Multivariate statistical techniques are efficient ways to display complex relationships among many objects. An attempt was made to study the radioactive data in two types of Sudanese phosphate deposits; Kurun and Uro phosphate, using several multivariate statistical methods. Pearson correlation coefficient revealed that a U-238 distribution in Kurun phosphate is controlled by the variation of K-40 concentration, whereas in Uro phosphate it is controlled by the variation of U-235 and U-234 concentration. Histograms and normal Q–Q plots clearly show that the radioactive variables did not follow a normal distribution. This non-normality feature observed may be attributed to complicating influence of geological factors. The principal components analysis (PCA) gives a model of five components for representing the acquired data from Kurun phosphate, where 89.5% of the total variance is explained. A model of four components was sufficient to represent the acquired data from Uro phosphate, where 87.5% of the total data variance is explained. The hierarchical cluster analysis (HCA) indicates that U-238 behaves in the same manner in the two types of phosphates; it associated with a group of four radionuclides; U-234, Po-210, Ra-226, Th-230, which the most abundant radionuclides, and all belong to the uranium-238 decay series. Two parameters have been adapted for the direct differentiate between the two phosphates. Firstly, U-238 in Uro phosphate have shown higher degree of mobility (CV% = 82.6) than that in Kurun phosphate (CV% = 64.7), and secondly, the activity ratio of Th-230/Th-232 in Uro phosphate is nine times than that in Kurun phosphate. - Highlights: ► Multivariate statistical techniques were used to characterize radioactive data. ► U-238 in Uro phosphate shows higher degree of mobility (CV% = 82.6). ► U-238 in Kurun phosphate shows lower degree of mobility (CV% = 64.7). ► The radioactive variables did not follow a normal distribution. ► The ratio of Th
A Regularized Linear Dynamical System Framework for Multivariate Time Series Analysis.
Liu, Zitao; Hauskrecht, Milos
2015-01-01
Linear Dynamical System (LDS) is an elegant mathematical framework for modeling and learning Multivariate Time Series (MTS). However, in general, it is difficult to set the dimension of an LDS's hidden state space. A small number of hidden states may not be able to model the complexities of a MTS, while a large number of hidden states can lead to overfitting. In this paper, we study learning methods that impose various regularization penalties on the transition matrix of the LDS model and propose a regularized LDS learning framework (rLDS) which aims to (1) automatically shut down LDSs' spurious and unnecessary dimensions, and consequently, address the problem of choosing the optimal number of hidden states; (2) prevent the overfitting problem given a small amount of MTS data; and (3) support accurate MTS forecasting. To learn the regularized LDS from data we incorporate a second order cone program and a generalized gradient descent method into the Maximum a Posteriori framework and use Expectation Maximization to obtain a low-rank transition matrix of the LDS model. We propose two priors for modeling the matrix which lead to two instances of our rLDS. We show that our rLDS is able to recover well the intrinsic dimensionality of the time series dynamics and it improves the predictive performance when compared to baselines on both synthetic and real-world MTS datasets.
Jayalakshmy, K.V.; Rao, K.K.
A study of planktonic foraminiferal assemblages from 19 stations in the neritic and oceanic regions off the Coromandel Coast, Bay of Bengal has been made using a multivariate statistical method termed as factor analysis. On the basis of abundance...
Bersimis, Sotiris; Panaretos, John; Psarakis, Stelios
2005-01-01
Woodall and Montgomery [35] in a discussion paper, state that multivariate process control is one of the most rapidly developing sections of statistical process control. Nowadays, in industry, there are many situations in which the simultaneous monitoring or control, of two or more related quality - process characteristics is necessary. Process monitoring problems in which several related variables are of interest are collectively known as Multivariate Statistical Process Control (MSPC).This ...
Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)
2012-03-15
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Nsikak U Benson
Full Text Available Trace metals (Cd, Cr, Cu, Ni and Pb concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria. The degree of contamination was assessed using the individual contamination factors (ICF and global contamination factor (GCF. Multivariate statistical approaches including principal component analysis (PCA, cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources.
Garcia, Francisco; Palacio, Carlos; Garcia, Uriel
2012-01-01
Multivariate statistical techniques were used to investigate the temporal and spatial variations of water quality at the Santa Marta coastal area where a submarine out fall that discharges 1 m3/s of domestic wastewater is located. Two-way analysis of variance (ANOVA), cluster and principal component analysis and Krigging interpolation were considered for this report. Temporal variation showed two heterogeneous periods. From December to April, and July, where the concentration of the water quality parameters is higher; the rest of the year (May, June, August-November) were significantly lower. The spatial variation reported two areas where the water quality is different, this difference is related to the proximity to the submarine out fall discharge.
Beyth, M.; McInteer, C.; Broxton, D.E.; Bolivar, S.L.; Luke, M.E.
1980-06-01
Multivariate statistical analyses were carried out on Hydrogeochemical and Stream Sediment Reconnaissance data from the Craig quadrangle, Colorado, to support the National Uranium Resource Evaluation and to evaluate strategic or other important commercial mineral resources. A few areas for favorable uranium mineralization are suggested for parts of the Wyoming Basin, Park Range, and Gore Range. Six potential source rocks for uranium are postulated based on factor score mapping. Vanadium in stream sediments is suggested as a pathfinder for carnotite-type mineralization. A probable northwest trend of lead-zinc-copper mineralization associated with Tertiary intrusions is suggested. A few locations are mapped where copper is associated with cobalt. Concentrations of placer sands containing rare earth elements, probably of commercial value, are indicated for parts of the Sand Wash Basin
Guo, Hongling; Yin, Baohua; Zhang, Jie; Quan, Yangke; Shi, Gaojun
2016-09-01
Counterfeiting of banknotes is a crime and seriously harmful to economy. Examination of the paper, ink and toners used to make counterfeit banknotes can provide useful information to classify and link different cases in which the suspects use the same raw materials. In this paper, 21 paper samples of counterfeit banknotes seized from 13 cases were analyzed by wavelength dispersive X-ray fluorescence. After measuring the elemental composition in paper semi-quantitatively, the normalized weight percentage data of 10 elements were processed by multivariate statistical methods of cluster analysis and principle component analysis. All these paper samples were mainly classified into 3 groups. Nine separate cases were successfully linked. It is demonstrated that elemental composition measured by XRF is a useful way to compare and classify papers used in different cases. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Voza Danijela
2015-12-01
Full Text Available The aim of this article is to evaluate the quality of the Danube River in its course through Serbia as well as to demonstrate the possibilities for using three statistical methods: Principal Component Analysis (PCA, Factor Analysis (FA and Cluster Analysis (CA in the surface water quality management. Given that the Danube is an important trans-boundary river, thorough water quality monitoring by sampling at different distances during shorter and longer periods of time is not only ecological, but also a political issue. Monitoring was carried out at monthly intervals from January to December 2011, at 17 sampling sites. The obtained data set was treated by multivariate techniques in order, firstly, to identify the similarities and differences between sampling periods and locations, secondly, to recognize variables that affect the temporal and spatial water quality changes and thirdly, to present the anthropogenic impact on water quality parameters.
Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t
2012-01-01
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
Cummins, J.D.
1965-02-01
With several white noise sources the various transmission paths of a linear multivariable system may be determined simultaneously. This memorandum considers the restrictions on pseudo-random two state sequences to effect simultaneous identification of several transmission paths and the consequential rejection of cross-coupled signals in linear multivariable systems. The conditions for simultaneous identification are established by an example, which shows that the integration time required is large i.e. tends to infinity, as it does when white noise sources are used. (author)
2012-01-01
Background The metals bioavailability in soils is commonly assessed by chemical extractions; however a generally accepted method is not yet established. In this study, the effectiveness of Diffusive Gradients in Thin-films (DGT) technique and single extractions in the assessment of metals bioaccumulation in vegetables, and the influence of soil parameters on phytoavailability were evaluated using multivariate statistics. Soil and plants grown in vegetable gardens from mining-affected rural areas, NW Romania, were collected and analysed. Results Pseudo-total metal content of Cu, Zn and Cd in soil ranged between 17.3-146 mg kg-1, 141–833 mg kg-1 and 0.15-2.05 mg kg-1, respectively, showing enriched contents of these elements. High degrees of metals extractability in 1M HCl and even in 1M NH4Cl were observed. Despite the relatively high total metal concentrations in soil, those found in vegetables were comparable to values typically reported for agricultural crops, probably due to the low concentrations of metals in soil solution (Csoln) and low effective concentrations (CE), assessed by DGT technique. Among the analysed vegetables, the highest metal concentrations were found in carrots roots. By applying multivariate statistics, it was found that CE, Csoln and extraction in 1M NH4Cl, were better predictors for metals bioavailability than the acid extractions applied in this study. Copper transfer to vegetables was strongly influenced by soil organic carbon (OC) and cation exchange capacity (CEC), while pH had a higher influence on Cd transfer from soil to plants. Conclusions The results showed that DGT can be used for general evaluation of the risks associated to soil contamination with Cu, Zn and Cd in field conditions. Although quantitative information on metals transfer from soil to vegetables was not observed. PMID:23079133
Linear mixed models a practical guide using statistical software
West, Brady T; Galecki, Andrzej T
2014-01-01
Highly recommended by JASA, Technometrics, and other journals, the first edition of this bestseller showed how to easily perform complex linear mixed model (LMM) analyses via a variety of software programs. Linear Mixed Models: A Practical Guide Using Statistical Software, Second Edition continues to lead readers step by step through the process of fitting LMMs. This second edition covers additional topics on the application of LMMs that are valuable for data analysts in all fields. It also updates the case studies using the latest versions of the software procedures and provides up-to-date information on the options and features of the software procedures available for fitting LMMs in SAS, SPSS, Stata, R/S-plus, and HLM.New to the Second Edition A new chapter on models with crossed random effects that uses a case study to illustrate software procedures capable of fitting these models Power analysis methods for longitudinal and clustered study designs, including software options for power analyses and suggest...
Md. Bodrud-Doza
2016-04-01
Full Text Available This study investigates the groundwater quality in the Faridpur district of central Bangladesh based on preselected 60 sample points. Water evaluation indices and a number of statistical approaches such as multivariate statistics and geostatistics are applied to characterize water quality, which is a major factor for controlling the groundwater quality in term of drinking purposes. The study reveal that EC, TDS, Ca2+, total As and Fe values of groundwater samples exceeded Bangladesh and international standards. Ground water quality index (GWQI exhibited that about 47% of the samples were belonging to good quality water for drinking purposes. The heavy metal pollution index (HPI, degree of contamination (Cd, heavy metal evaluation index (HEI reveal that most of the samples belong to low level of pollution. However, Cd provide better alternative than other indices. Principle component analysis (PCA suggests that groundwater quality is mainly related to geogenic (rock–water interaction and anthropogenic source (agrogenic and domestic sewage in the study area. Subsequently, the findings of cluster analysis (CA and correlation matrix (CM are also consistent with the PCA results. The spatial distributions of groundwater quality parameters are determined by geostatistical modeling. The exponential semivariagram model is validated as the best fitted models for most of the indices values. It is expected that outcomes of the study will provide insights for decision makers taking proper measures for groundwater quality management in central Bangladesh.
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mew, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2015-09-24
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.
Fouque, A.L.; Ciuciu, Ph.; Risser, L.; Fouque, A.L.; Ciuciu, Ph.; Risser, L.
2009-01-01
In this paper, a novel statistical parcellation of intra-subject functional MRI (fMRI) data is proposed. The key idea is to identify functionally homogenous regions of interest from their hemodynamic parameters. To this end, a non-parametric voxel-based estimation of hemodynamic response function is performed as a prerequisite. Then, the extracted hemodynamic features are entered as the input data of a Multivariate Spatial Gaussian Mixture Model (MSGMM) to be fitted. The goal of the spatial aspect is to favor the recovery of connected components in the mixture. Our statistical clustering approach is original in the sense that it extends existing works done on univariate spatially regularized Gaussian mixtures. A specific Gibbs sampler is derived to account for different covariance structures in the feature space. On realistic artificial fMRI datasets, it is shown that our algorithm is helpful for identifying a parsimonious functional parcellation required in the context of joint detection estimation of brain activity. This allows us to overcome the classical assumption of spatial stationarity of the BOLD signal model. (authors)
UNCOVERING THE FORMATION OF ULTRACOMPACT DWARF GALAXIES BY MULTIVARIATE STATISTICAL ANALYSIS
Chattopadhyay, Tanuka; Sharina, Margarita; Davoust, Emmanuel; De, Tuli; Chattopadhyay, Asis Kumar
2012-01-01
We present a statistical analysis of the properties of a large sample of dynamically hot old stellar systems, from globular clusters (GCs) to giant ellipticals, which was performed in order to investigate the origin of ultracompact dwarf galaxies (UCDs). The data were mostly drawn from Forbes et al. We recalculated some of the effective radii, computed mean surface brightnesses and mass-to-light ratios, and estimated ages and metallicities. We completed the sample with GCs of M31. We used a multivariate statistical technique (K-Means clustering), together with a new algorithm (Gap Statistics) for finding the optimum number of homogeneous sub-groups in the sample, using a total of six parameters (absolute magnitude, effective radius, virial mass-to-light ratio, stellar mass-to-light ratio, and metallicity). We found six groups. FK1 and FK5 are composed of high- and low-mass elliptical galaxies, respectively. FK3 and FK6 are composed of high-metallicity and low-metallicity objects, respectively, and both include GCs and UCDs. Two very small groups, FK2 and FK4, are composed of Local Group dwarf spheroidals. Our groups differ in their mean masses and virial mass-to-light ratios. The relations between these two parameters are also different for the various groups. The probability density distributions of metallicity for the four groups of galaxies are similar to those of the GCs and UCDs. The brightest low-metallicity GCs and UCDs tend to follow the mass-metallicity relation like elliptical galaxies. The objects of FK3 are more metal-rich per unit effective luminosity density than high-mass ellipticals.
UNCOVERING THE FORMATION OF ULTRACOMPACT DWARF GALAXIES BY MULTIVARIATE STATISTICAL ANALYSIS
Chattopadhyay, Tanuka [Department of Applied Mathematics, Calcutta University, 92 A.P.C. Road, Calcutta 700009 (India); Sharina, Margarita [Special Astrophysical Observatory, Russian Academy of Sciences, N. Arkhyz, KCh R 369167 (Russian Federation); Davoust, Emmanuel [IRAP, Universite de Toulouse, CNRS, 14 Avenue Edouard Belin, 31400 Toulouse (France); De, Tuli; Chattopadhyay, Asis Kumar, E-mail: tanuka@iucaa.ernet.in, E-mail: sme@sao.ru, E-mail: davoust@ast.obs-mip.fr, E-mail: akcstat@caluniv.ac.in [Department of Statistics, Calcutta University, 35 B.C. Road, Calcutta 700019 (India)
2012-05-10
We present a statistical analysis of the properties of a large sample of dynamically hot old stellar systems, from globular clusters (GCs) to giant ellipticals, which was performed in order to investigate the origin of ultracompact dwarf galaxies (UCDs). The data were mostly drawn from Forbes et al. We recalculated some of the effective radii, computed mean surface brightnesses and mass-to-light ratios, and estimated ages and metallicities. We completed the sample with GCs of M31. We used a multivariate statistical technique (K-Means clustering), together with a new algorithm (Gap Statistics) for finding the optimum number of homogeneous sub-groups in the sample, using a total of six parameters (absolute magnitude, effective radius, virial mass-to-light ratio, stellar mass-to-light ratio, and metallicity). We found six groups. FK1 and FK5 are composed of high- and low-mass elliptical galaxies, respectively. FK3 and FK6 are composed of high-metallicity and low-metallicity objects, respectively, and both include GCs and UCDs. Two very small groups, FK2 and FK4, are composed of Local Group dwarf spheroidals. Our groups differ in their mean masses and virial mass-to-light ratios. The relations between these two parameters are also different for the various groups. The probability density distributions of metallicity for the four groups of galaxies are similar to those of the GCs and UCDs. The brightest low-metallicity GCs and UCDs tend to follow the mass-metallicity relation like elliptical galaxies. The objects of FK3 are more metal-rich per unit effective luminosity density than high-mass ellipticals.
Abbas Alkarkhi, F.M.; Ismail, Norli; Easa, Azhar Mat
2008-01-01
Cockles (Anadara granosa) sample obtained from two rivers in the Penang State of Malaysia were analyzed for the content of arsenic (As) and heavy metals (Cr, Cd, Zn, Cu, Pb, and Hg) using a graphite flame atomic absorption spectrometer (GF-AAS) for Cr, Cd, Zn, Cu, Pb, As and cold vapor atomic absorption spectrometer (CV-AAS) for Hg. The two locations of interest with 20 sampling points of each location were Kuala Juru (Juru River) and Bukit Tambun (Jejawi River). Multivariate statistical techniques such as multivariate analysis of variance (MANOVA) and discriminant analysis (DA) were applied for analyzing the data. MANOVA showed a strong significant difference between the two rivers in term of As and heavy metals contents in cockles. DA gave the best result to identify the relative contribution for all parameters in discriminating (distinguishing) the two rivers. It provided an important data reduction as it used only two parameters (Zn and Cd) affording more than 72% correct assignations. Results indicated that the two rivers were different in terms of As and heavy metal contents in cockle, and the major difference was due to the contribution of Zn and Cd. A positive correlation was found between discriminate functions (DF) and Zn, Cd and Cr, whereas negative correlation was exhibited with other heavy metals. Therefore, DA allowed a reduction in the dimensionality of the data set, delineating a few indicator parameters responsible for large variations in heavy metals and arsenic content. Taking into account of these results, it can be suggested that a continuous monitoring of As and heavy metals in cockles be performed in these two rivers
Victor V. Nikitin
2013-01-01
Full Text Available The article introduces the algorithm of Russia’s regions investment potential estimation, developed by means of multivariate statistical methods, determines the factors, reflecting regions investment state. The integral indicator was developed on their basis, using statistical data. The article presents regions’ classification on the basis of the integral index
Multivariate analysis with LISREL
Jöreskog, Karl G; Y Wallentin, Fan
2016-01-01
This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.
Gledsneli Maria Lima Lins
2010-12-01
Full Text Available Water has a decisive influence on populations’ life quality – specifically in areas like urban supply, drainage, and effluents treatment – due to its sound impact over public health. Water rational use constitutes the greatest challenge faced by water demand management, mainly with regard to urban household water consumption. This makes it important to develop researches to assist water managers and public policy-makers in planning and formulating water demand measures which may allow urban water rational use to be met. This work utilized the multivariate techniques Factor Analysis and Multiple Linear Regression Analysis – in order to determine the participation level of socioeconomic and climatic variables in monthly urban household consumption changes – applying them to two districts of Campina Grande city (State of Paraíba, Brazil. The districts were chosen based on socioeconomic criterion (income level so as to evaluate their water consumer’s behavior. A 9-year monthly data series (from year 2000 up to 2008 was utilized, comprising family income, water tariff, and quantity of household connections (economies – as socioeconomic variables – and average temperature and precipitation, as climatic variables. For both the selected districts of Campina Grande city, the obtained results point out the variables “water tariff” and “family income” as indicators of these district’s household consumption.
Wallace, Jack, E-mail: jack.wallace@ce.queensu.ca [Department of Civil Engineering, Queen’s University, Ellis Hall, 58 University Avenue, Kingston, Ontario K7L 3N6 (Canada); Champagne, Pascale, E-mail: champagne@civil.queensu.ca [Department of Civil Engineering, Queen’s University, Ellis Hall, 58 University Avenue, Kingston, Ontario K7L 3N6 (Canada); Monnier, Anne-Charlotte, E-mail: anne-charlotte.monnier@insa-lyon.fr [National Institute for Applied Sciences – Lyon, 20 Avenue Albert Einstein, 69621 Villeurbanne Cedex (France)
2015-01-15
Highlights: • Performance of a hybrid passive landfill leachate treatment system was evaluated. • 33 Water chemistry parameters were sampled for 21 months and statistically analyzed. • Parameters were strongly linked and explained most (>40%) of the variation in data. • Alkalinity, ammonia, COD, heavy metals, and iron were criteria for performance. • Eight other parameters were key in modeling system dynamics and criteria. - Abstract: A pilot-scale hybrid-passive treatment system operated at the Merrick Landfill in North Bay, Ontario, Canada, treats municipal landfill leachate and provides for subsequent natural attenuation. Collected leachate is directed to a hybrid-passive treatment system, followed by controlled release to a natural attenuation zone before entering the nearby Little Sturgeon River. The study presents a comprehensive evaluation of the performance of the system using multivariate statistical techniques to determine the interactions between parameters, major pollutants in the leachate, and the biological and chemical processes occurring in the system. Five parameters (ammonia, alkalinity, chemical oxygen demand (COD), “heavy” metals of interest, with atomic weights above calcium, and iron) were set as criteria for the evaluation of system performance based on their toxicity to aquatic ecosystems and importance in treatment with respect to discharge regulations. System data for a full range of water quality parameters over a 21-month period were analyzed using principal components analysis (PCA), as well as principal components (PC) and partial least squares (PLS) regressions. PCA indicated a high degree of association for most parameters with the first PC, which explained a high percentage (>40%) of the variation in the data, suggesting strong statistical relationships among most of the parameters in the system. Regression analyses identified 8 parameters (set as independent variables) that were most frequently retained for modeling
Zamani Abbas Ali
2012-12-01
Full Text Available Abstract The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP. Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs. Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Multivariate statistical process control in product quality review assessment - A case study.
Kharbach, M; Cherrah, Y; Vander Heyden, Y; Bouklouze, A
2017-11-01
According to the Food and Drug Administration and the European Good Manufacturing Practices (GMP) guidelines, Annual Product Review (APR) is a mandatory requirement in GMP. It consists of evaluating a large collection of qualitative or quantitative data in order to verify the consistency of an existing process. According to the Code of Federal Regulation Part 11 (21 CFR 211.180), all finished products should be reviewed annually for the quality standards to determine the need of any change in specification or manufacturing of drug products. Conventional Statistical Process Control (SPC) evaluates the pharmaceutical production process by examining only the effect of a single factor at the time using a Shewhart's chart. It neglects to take into account the interaction between the variables. In order to overcome this issue, Multivariate Statistical Process Control (MSPC) can be used. Our case study concerns an APR assessment, where 164 historical batches containing six active ingredients, manufactured in Morocco, were collected during one year. Each batch has been checked by assaying the six active ingredients by High Performance Liquid Chromatography according to European Pharmacopoeia monographs. The data matrix was evaluated both by SPC and MSPC. The SPC indicated that all batches are under control, while the MSPC, based on Principal Component Analysis (PCA), for the data being either autoscaled or robust scaled, showed four and seven batches, respectively, out of the Hotelling T 2 95% ellipse. Also, an improvement of the capability of the process is observed without the most extreme batches. The MSPC can be used for monitoring subtle changes in the manufacturing process during an APR assessment. Copyright © 2017 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.
Use of multivariate statistics to identify unreliable data obtained using CASA.
Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón
2013-06-01
In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein
2012-12-17
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Miaomiao Jiang
Full Text Available Botanical primary metabolites extensively exist in herbal medicine injections (HMIs, but often were ignored to control. With the limitation of bias towards hydrophilic substances, the primary metabolites with strong polarity, such as saccharides, amino acids and organic acids, are usually difficult to detect by the routinely applied reversed-phase chromatographic fingerprint technology. In this study, a proton nuclear magnetic resonance (1H NMR profiling method was developed for efficient identification and quantification of small polar molecules, mostly primary metabolites in HMIs. A commonly used medicine, Danhong injection (DHI, was employed as a model. With the developed method, 23 primary metabolites together with 7 polyphenolic acids were simultaneously identified, of which 13 metabolites with fully separated proton signals were quantified and employed for further multivariate quality control assay. The quantitative 1H NMR method was validated with good linearity, precision, repeatability, stability and accuracy. Based on independence principal component analysis (IPCA, the contents of 13 metabolites were characterized and dimensionally reduced into the first two independence principal components (IPCs. IPC1 and IPC2 were then used to calculate the upper control limits (with 99% confidence ellipsoids of χ2 and Hotelling T2 control charts. Through the constructed upper control limits, the proposed method was successfully applied to 36 batches of DHI to examine the out-of control sample with the perturbed levels of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde. The integrated strategy has provided a reliable approach to identify and quantify multiple polar metabolites of DHI in one fingerprinting spectrum, and it has also assisted in the establishment of IPCA models for the multivariate statistical evaluation of HMIs.
Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding.
Wang, Xiang; Zheng, Yuan; Zhao, Zhenzhou; Wang, Jinping
2015-07-06
Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches.
Multivariate statistical analysis of electron energy-loss spectroscopy in anisotropic materials
Hu Xuerang; Sun Yuekui; Yuan Jun
2008-01-01
Recently, an expression has been developed to take into account the complex dependence of the fine structure in core-level electron energy-loss spectroscopy (EELS) in anisotropic materials on specimen orientation and spectral collection conditions [Y. Sun, J. Yuan, Phys. Rev. B 71 (2005) 125109]. One application of this expression is the development of a phenomenological theory of magic-angle electron energy-loss spectroscopy (MAEELS), which can be used to extract the isotropically averaged spectral information for materials with arbitrary anisotropy. Here we use this expression to extract not only the isotropically averaged spectral information, but also the anisotropic spectral components, without the restriction of MAEELS. The application is based on a multivariate statistical analysis of core-level EELS for anisotropic materials. To demonstrate the applicability of this approach, we have conducted a study on a set of carbon K-edge spectra of multi-wall carbon nanotube (MWCNT) acquired with energy-loss spectroscopic profiling (ELSP) technique and successfully extracted both the averaged and dichroic spectral components of the wrapped graphite-like sheets. Our result shows that this can be a practical alternative to MAEELS for the study of electronic structure of anisotropic materials, in particular for those nanostructures made of layered materials
Application of Multivariate Statistical Analysis to Biomarkers in Se-Turkey Crude Oils
Gürgey, K.; Canbolat, S.
2017-11-01
Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%), Stable Carbon Isotope, Gas Chromatography (GC), and Gas Chromatography-Mass Spectrometry (GC-MS) data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE) Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.
APPLICATION OF MULTIVARIATE STATISTICAL ANALYSIS TO BIOMARKERS IN SE-TURKEY CRUDE OILS
K. Gürgey
2017-11-01
Full Text Available Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%, Stable Carbon Isotope, Gas Chromatography (GC, and Gas Chromatography-Mass Spectrometry (GC-MS data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.
Daeid, N. Nic; Meier-Augenstein, W.; Kemp, H. F.
2012-04-01
The analysis of cotton fibres can be particularly challenging within a forensic science context where discrimination of one fibre from another is of importance. Normally cotton fibre analysis examines the morphological structure of the recovered material and compares this with that of a known fibre from a particular source of interest. However, the conventional microscopic and chemical analysis of fibres and any associated dyes is generally unsuccessful because of the similar morphology of the fibres. Analysis of the dyes which may have been applied to the cotton fibre can also be undertaken though this can be difficult and unproductive in terms of discriminating one fibre from another. In the study presented here we have explored the potential for Isotope Ratio Mass Spectrometry (IRMS) to be utilised as an additional tool for cotton fibre analysis in an attempt to reveal further discriminatory information. This work has concentrated on un-dyed cotton fibres of known origin in order to expose the potential of the analytical technique. We report the results of a pilot study aimed at testing the hypothesis that multi-element stable isotope analysis of cotton fibres in conjunction with multivariate statistical analysis of the resulting isotopic abundance data using well established chemometric techniques permits sample provenancing based on the determination of where the cotton was grown and as such will facilitate sample discrimination. To date there is no recorded literature of this type of application of IRMS to cotton samples, which may be of forensic science relevance.
Qing Gu
2016-03-01
Full Text Available Qiandao Lake (Xin’an Jiang reservoir plays a significant role in drinking water supply for eastern China, and it is an attractive tourist destination. Three multivariate statistical methods were comprehensively applied to assess the spatial and temporal variations in water quality as well as potential pollution sources in Qiandao Lake. Data sets of nine parameters from 12 monitoring sites during 2010–2013 were obtained for analysis. Cluster analysis (CA was applied to classify the 12 sampling sites into three groups (Groups A, B and C and the 12 monitoring months into two clusters (April-July, and the remaining months. Discriminant analysis (DA identified Secchi disc depth, dissolved oxygen, permanganate index and total phosphorus as the significant variables for distinguishing variations of different years, with 79.9% correct assignments. Dissolved oxygen, pH and chlorophyll-a were determined to discriminate between the two sampling periods classified by CA, with 87.8% correct assignments. For spatial variation, DA identified Secchi disc depth and ammonia nitrogen as the significant discriminating parameters, with 81.6% correct assignments. Principal component analysis (PCA identified organic pollution, nutrient pollution, domestic sewage, and agricultural and surface runoff as the primary pollution sources, explaining 84.58%, 81.61% and 78.68% of the total variance in Groups A, B and C, respectively. These results demonstrate the effectiveness of integrated use of CA, DA and PCA for reservoir water quality evaluation and could assist managers in improving water resources management.
Babamoradi, Hamid; van den Berg, Frans; Rinnan, Åsmund
2016-02-18
In Multivariate Statistical Process Control, when a fault is expected or detected in the process, contribution plots are essential for operators and optimization engineers in identifying those process variables that were affected by or might be the cause of the fault. The traditional way of interpreting a contribution plot is to examine the largest contributing process variables as the most probable faulty ones. This might result in false readings purely due to the differences in natural variation, measurement uncertainties, etc. It is more reasonable to compare variable contributions for new process runs with historical results achieved under Normal Operating Conditions, where confidence limits for contribution plots estimated from training data are used to judge new production runs. Asymptotic methods cannot provide confidence limits for contribution plots, leaving re-sampling methods as the only option. We suggest bootstrap re-sampling to build confidence limits for all contribution plots in online PCA-based MSPC. The new strategy to estimate CLs is compared to the previously reported CLs for contribution plots. An industrial batch process dataset was used to illustrate the concepts. Copyright © 2016 Elsevier B.V. All rights reserved.
Teck-Yee Ling
2017-01-01
Full Text Available The present study evaluated the spatial variations of surface water quality in a tropical river using multivariate statistical techniques, including cluster analysis (CA and principal component analysis (PCA. Twenty physicochemical parameters were measured at 30 stations along the Batang Baram and its tributaries. The water quality of the Batang Baram was categorized as “slightly polluted” where the chemical oxygen demand and total suspended solids were the most deteriorated parameters. The CA grouped the 30 stations into four clusters which shared similar characteristics within the same cluster, representing the upstream, middle, and downstream regions of the main river and the tributaries from the middle to downstream regions of the river. The PCA has determined a reduced number of six principal components that explained 83.6% of the data set variance. The first PC indicated that the total suspended solids, turbidity, and hydrogen sulphide were the dominant polluting factors which is attributed to the logging activities, followed by the five-day biochemical oxygen demand, total phosphorus, organic nitrogen, and nitrate-nitrogen in the second PC which are related to the discharges from domestic wastewater. The components also imply that logging activities are the major anthropogenic activities responsible for water quality variations in the Batang Baram when compared to the domestic wastewater discharge.
Podorozhnyi, D.M.; Postnikov, E.B.; Sveshnikova, L.G.; Turundaevsky, A.N.
2005-01-01
A multivariate statistical procedure for solving problems of estimating physical parameters on the basis of data from measurements with multichannel equipment is described. Within the multivariate procedure, an algorithm is constructed for estimating the energy of primary cosmic rays and the exponent in their power-law spectrum. They are investigated by using the KLEM spectrometer (NUCLEON project) as a specific example of measuring equipment. The results of computer experiments simulating the operation of the multivariate procedure for this equipment are given, the proposed approach being compared in these experiments with the one-parameter approach presently used in data processing
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
Gregoire, Alexandre David
2011-07-01
The goal of this research was to accurately predict the ultimate compressive load of impact damaged graphite/epoxy coupons using a Kohonen self-organizing map (SOM) neural network and multivariate statistical regression analysis (MSRA). An optimized use of these data treatment tools allowed the generation of a simple, physically understandable equation that predicts the ultimate failure load of an impacted damaged coupon based uniquely on the acoustic emissions it emits at low proof loads. Acoustic emission (AE) data were collected using two 150 kHz resonant transducers which detected and recorded the AE activity given off during compression to failure of thirty-four impacted 24-ply bidirectional woven cloth laminate graphite/epoxy coupons. The AE quantification parameters duration, energy and amplitude for each AE hit were input to the Kohonen self-organizing map (SOM) neural network to accurately classify the material failure mechanisms present in the low proof load data. The number of failure mechanisms from the first 30% of the loading for twenty-four coupons were used to generate a linear prediction equation which yielded a worst case ultimate load prediction error of 16.17%, just outside of the +/-15% B-basis allowables, which was the goal for this research. Particular emphasis was placed upon the noise removal process which was largely responsible for the accuracy of the results.
Žvokelj, Matej; Zupan, Samo; Prebil, Ivan
2011-10-01
The article presents a novel non-linear multivariate and multiscale statistical process monitoring and signal denoising method which combines the strengths of the Kernel Principal Component Analysis (KPCA) non-linear multivariate monitoring approach with the benefits of Ensemble Empirical Mode Decomposition (EEMD) to handle multiscale system dynamics. The proposed method which enables us to cope with complex even severe non-linear systems with a wide dynamic range was named the EEMD-based multiscale KPCA (EEMD-MSKPCA). The method is quite general in nature and could be used in different areas for various tasks even without any really deep understanding of the nature of the system under consideration. Its efficiency was first demonstrated by an illustrative example, after which the applicability for the task of bearing fault detection, diagnosis and signal denosing was tested on simulated as well as actual vibration and acoustic emission (AE) signals measured on purpose-built large-size low-speed bearing test stand. The positive results obtained indicate that the proposed EEMD-MSKPCA method provides a promising tool for tackling non-linear multiscale data which present a convolved picture of many events occupying different regions in the time-frequency plane.
Multivariate strategies in functional magnetic resonance imaging
Hansen, Lars Kai
2007-01-01
We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a `mind reading' predictive multivariate fMRI model....
Vujović Svetlana R.
2013-01-01
Full Text Available This paper illustrates the utility of multivariate statistical techniques for analysis and interpretation of water quality data sets and identification of pollution sources/factors with a view to get better information about the water quality and design of monitoring network for effective management of water resources. Multivariate statistical techniques, such as factor analysis (FA/principal component analysis (PCA and cluster analysis (CA, were applied for the evaluation of variations and for the interpretation of a water quality data set of the natural water bodies obtained during 2010 year of monitoring of 13 parameters at 33 different sites. FA/PCA attempts to explain the correlations between the observations in terms of the underlying factors, which are not directly observable. Factor analysis is applied to physico-chemical parameters of natural water bodies with the aim classification and data summation as well as segmentation of heterogeneous data sets into smaller homogeneous subsets. Factor loadings were categorized as strong and moderate corresponding to the absolute loading values of >0.75, 0.75-0.50, respectively. Four principal factors were obtained with Eigenvalues >1 summing more than 78 % of the total variance in the water data sets, which is adequate to give good prior information regarding data structure. Each factor that is significantly related to specific variables represents a different dimension of water quality. The first factor F1 accounting for 28 % of the total variance and represents the hydrochemical dimension of water quality. The second factor F2 accounting for 18% of the total variance and may be taken factor of water eutrophication. The third factor F3 accounting 17 % of the total variance and represents the influence of point sources of pollution on water quality. The fourth factor F4 accounting 13 % of the total variance and may be taken as an ecological dimension of water quality. Cluster analysis (CA is an
Djorgovski, S. George
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.
Baez-Cazull, S. E.; McGuire, J.T.; Cozzarelli, I.M.; Voytek, M.A.
2008-01-01
Determining the processes governing aqueous biogeochemistry in a wetland hydrologically linked to an underlying contaminated aquifer is challenging due to the complex exchange between the systems and their distinct responses to changes in precipitation, recharge, and biological activities. To evaluate temporal and spatial processes in the wetland-aquifer system, water samples were collected using cm-scale multichambered passive diffusion samplers (peepers) to span the wetland-aquifer interface over a period of 3 yr. Samples were analyzed for major cations and anions, methane, and a suite of organic acids resulting in a large dataset of over 8000 points, which was evaluated using multivariate statistics. Principal component analysis (PCA) was chosen with the purpose of exploring the sources of variation in the dataset to expose related variables and provide insight into the biogeochemical processes that control the water chemistry of the system. Factor scores computed from PCA were mapped by date and depth. Patterns observed suggest that (i) fermentation is the process controlling the greatest variability in the dataset and it peaks in May; (ii) iron and sulfate reduction were the dominant terminal electron-accepting processes in the system and were associated with fermentation but had more complex seasonal variability than fermentation; (iii) methanogenesis was also important and associated with bacterial utilization of minerals as a source of electron acceptors (e.g., barite BaSO4); and (iv) seasonal hydrological patterns (wet and dry periods) control the availability of electron acceptors through the reoxidation of reduced iron-sulfur species enhancing iron and sulfate reduction. Copyright ?? 2008 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.
Djorgovski, S. G.
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has
Carneiro, Alvaro Luiz Guimaraes; Santos, Francisco Carlos Barbosa dos
2007-01-01
Energy is an essential input for social development and economic growth. The production and use of energy cause environmental degradation at all levels, being local, regional and global such as, combustion of fossil fuels causing air pollution; hydropower often causes environmental damage due to the submergence of large areas of land; and global climate change associated with the increasing concentration of greenhouse gases in the atmosphere. As mentioned in chapter 9 of Agenda 21, the Energy is essential to economic and social development and improved quality of life. Much of the world's energy, however, is currently produced and consumed in ways that could not be sustained if technologies were remain constant and if overall quantities were to increase substantially. All energy sources will need to be used in ways that respect the atmosphere, human health, and the environment as a whole. The energy in the context of sustainable development needs a set of quantifiable parameters, called indicators, to measure and monitor important changes and significant progress towards the achievement of the objectives of sustainable development policies. The indicators are divided into four dimensions: social, economic, environmental and institutional. This paper shows a methodology of analysis using Multivariate Statistical Technique that provide the ability to analyse complex sets of data. The main goal of this study is to explore the correlation analysis among the indicators. The data used on this research work, is an excerpt of IBGE (Instituto Brasileiro de Geografia e Estatistica) data census. The core indicators used in this study follows The IAEA (International Atomic Energy Agency) framework: Energy Indicators for Sustainable Development. (author)
Tay, C. K.; Hayford, E. K.; Hodgson, I. O. A.
2017-06-01
Multivariate statistical technique and hydrogeochemical approach were employed for groundwater assessment within the Lower Pra Basin. The main objective was to delineate the main processes that are responsible for the water chemistry and pollution of groundwater within the basin. Fifty-four (54) (No) boreholes were sampled in January 2012 for quality assessment. PCA using Varimax with Kaiser Normalization method of extraction for both rotated space and component matrix have been applied to the data. Results show that Spearman's correlation matrix of major ions revealed expected process-based relationships derived mainly from the geochemical processes, such as ion-exchange and silicate/aluminosilicate weathering within the aquifer. Three main principal components influence the water chemistry and pollution of groundwater within the basin. The three principal components have accounted for approximately 79% of the total variance in the hydrochemical data. Component 1 delineates the main natural processes (water-soil-rock interactions) through which groundwater within the basin acquires its chemical characteristics, Component 2 delineates the incongruent dissolution of silicate/aluminosilicates, while Component 3 delineates the prevalence of pollution principally from agricultural input as well as trace metal mobilization in groundwater within the basin. The loadings and score plots of the first two PCs show grouping pattern which indicates the strength of the mutual relation among the hydrochemical variables. In terms of proper management and development of groundwater within the basin, communities, where intense agriculture is taking place, should be monitored and protected from agricultural activities. especially where inorganic fertilizers are used by creating buffer zones. Monitoring of the water quality especially the water pH is recommended to ensure the acid neutralizing potential of groundwater within the basin thereby, curtailing further trace metal
Cordes, Gail Adele; Van Ausdeln, Leo Anthony; Velasquez, Maria Elena
2002-01-01
The report discusses preliminary proof-of-concept research for using the Advanced Data Validation and Verification System (ADVVS), a new INEEL software package, to add validation and verification and multivariate feedback control to the operation of non-destructive analysis (NDA) equipment. The software is based on human cognition, the recognition of patterns and changes in patterns in time-related data. The first project applied ADVVS to monitor operations of a selectable energy linear electron accelerator, and showed how the software recognizes in real time any deviations from the optimal tune of the machine. The second project extended the software method to provide model-based multivariate feedback control for the same linear electron accelerator. The projects successfully demonstrated proof-of-concept for the applications and focused attention on the common application of intelligent information processing techniques
McKinley, C. C.; Scudder, R.; Thomas, D. J.
2016-12-01
The Neodymium Isotopic composition (Nd IC) of oxide coatings has been applied as a tracer of water mass composition and used to address fundamental questions about past ocean conditions. The leached authigenic oxide coating from marine sediment is widely assumed to reflect the dissolved trace metal composition of the bottom water interacting with sediment at the seafloor. However, recent studies have shown that readily reducible sediment components, in addition to trace metal fluxes from the pore water, are incorporated into the bottom water, influencing the trace metal composition of leached oxide coatings. This challenges the prevailing application of the authigenic oxide Nd IC as a proxy of seawater composition. Therefore, it is important to identify the component end-members that create sediments of different lithology and determine if, or how they might contribute to the Nd IC of oxide coatings. To investigate lithologic influence on the results of sequential leaching, we selected two sites with complete bulk sediment statistical characterization. Site U1370 in the South Pacific Gyre, is predominantly composed of Rhyolite ( 60%) and has a distinguishable ( 10%) Fe-Mn Oxyhydroxide component (Dunlea et al., 2015). Site 1149 near the Izu-Bonin-Arc is predominantly composed of dispersed ash ( 20-50%) and eolian dust from Asia ( 50-80%) (Scudder et al., 2014). We perform a two-step leaching procedure: a 14 mL of 0.02 M hydroxylamine hydrochloride (HH) in 20% acetic acid buffered to a pH 4 for one hour, targeting metals bound to Fe- and Mn- oxides fractions, and a second HH leach for 12 hours, designed to remove any remaining oxides from the residual component. We analyze all three resulting fractions for a large suite of major, trace and rare earth elements, a sub-set of the samples are also analyzed for Nd IC. We use multivariate statistical analyses of the resulting geochemical data to identify how each component of the sediment partitions across the sequential
Vasilaki, V; Volcke, E I P; Nandi, A K; van Loosdrecht, M C M; Katsou, E
2018-04-26
Multivariate statistical analysis was applied to investigate the dependencies and underlying patterns between N 2 O emissions and online operational variables (dissolved oxygen and nitrogen component concentrations, temperature and influent flow-rate) during biological nitrogen removal from wastewater. The system under study was a full-scale reactor, for which hourly sensor data were available. The 15-month long monitoring campaign was divided into 10 sub-periods based on the profile of N 2 O emissions, using Binary Segmentation. The dependencies between operating variables and N 2 O emissions fluctuated according to Spearman's rank correlation. The correlation between N 2 O emissions and nitrite concentrations ranged between 0.51 and 0.78. Correlation >0.7 between N 2 O emissions and nitrate concentrations was observed at sub-periods with average temperature lower than 12 °C. Hierarchical k-means clustering and principal component analysis linked N 2 O emission peaks with precipitation events and ammonium concentrations higher than 2 mg/L, especially in sub-periods characterized by low N 2 O fluxes. Additionally, the highest ranges of measured N 2 O fluxes belonged to clusters corresponding with NO 3 -N concentration less than 1 mg/L in the upstream plug-flow reactor (middle of oxic zone), indicating slow nitrification rates. The results showed that the range of N 2 O emissions partially depends on the prior behavior of the system. The principal component analysis validated the findings from the clustering analysis and showed that ammonium, nitrate, nitrite and temperature explained a considerable percentage of the variance in the system for the majority of the sub-periods. The applied statistical methods, linked the different ranges of emissions with the system variables, provided insights on the effect of operating conditions on N 2 O emissions in each sub-period and can be integrated into N 2 O emissions data processing at wastewater treatment plants
Singh, Kunwar P.; Malik, Amrita; Sinha, Sarita
2005-01-01
Multivariate statistical techniques, such as cluster analysis (CA), factor analysis (FA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the data set on water quality of the Gomti river (India), generated during three years (1999-2001) monitoring at eight different sites for 34 parameters (9792 observations). This study presents usefulness of multivariate statistical techniques for evaluation and interpretation of large complex water quality data sets and apportionment of pollution sources/factors with a view to get better information about the water quality and design of monitoring network for effective management of water resources. Three significant groups, upper catchments (UC), middle catchments (MC) and lower catchments (LC) of sampling sites were obtained through CA on the basis of similarity between them. FA/PCA applied to the data sets pertaining to three catchments regions of the river resulted in seven, seven and six latent factors, respectively responsible for the data structure, explaining 74.3, 73.6 and 81.4% of the total variance of the respective data sets. These included the trace metals group (leaching from soil and industrial waste disposal sites), organic pollution group (municipal and industrial effluents), nutrients group (agricultural runoff), alkalinity, hardness, EC and solids (soil leaching and runoff process). DA showed the best results for data reduction and pattern recognition during both temporal and spatial analysis. It rendered five parameters (temperature, total alkalinity, Cl, Na and K) affording more than 94% right assignations in temporal analysis, while 10 parameters (river discharge, pH, BOD, Cl, F, PO 4 , NH 4 -N, NO 3 -N, TKN and Zn) to afford 97% right assignations in spatial analysis of three different regions in the basin. Thus, DA allowed reduction in dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. Further
Comparison of multivariate and univariate statistical process control and monitoring methods
Leger, R.P.; Garland, WM.J.; Macgregor, J.F.
1996-01-01
Work in recent years has lead to the development of multivariate process monitoring schemes which use Principal Component Analysis (PCA). This research compares the performance of a univariate scheme and a multivariate PCA scheme used for monitoring a simple process with 11 measured variables. The multivariate PCA scheme was able to adequately represent the process using two principal components. This resulted in a PCA monitoring scheme which used two charts as opposed to 11 charts for the univariate scheme and therefore had distinct advantages in terms of both data representation, presentation, and fault diagnosis capabilities. (author)
Jackknife Variance Estimator for Two Sample Linear Rank Statistics
1988-11-01
Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT
Buttigieg, Pier Luigi; Ramette, Alban Nicolas
2014-01-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynami...
Kemperman, Ramses F. J.; Horvatovich, Peter L.; Hoekman, Berend; Reijmers, Theo H.; Muskiet, Frits A. J.; Bischoff, Rainer
2007-01-01
We describe a platform for the comparative profiling of urine using reversed-phase liquid chromatography-mass spectrometry (LC-MS) and multivariate statistical data analysis. Urinary compounds were separated by gradient elution and subsequently detected by electrospray Ion-Trap MS. The lower limit
M.A. Delavar
2016-02-01
Full Text Available Introduction: The accumulation of heavy metals (HMs in the soil is of increasing concern due to food safety issues, potential health risks, and the detrimental effects on soil ecosystems. HMs may be considered as the most important soil pollutants, because they are not biodegradable and their physical movement through the soil profile is relatively limited. Therefore, root uptake process may provide a big chance for these pollutants to transfer from the surface soil to natural and cultivated plants, which may eventually steer them to human bodies. The general behavior of HMs in the environment, especially their bioavailability in the soil, is influenced by their origin. Hence, source apportionment of HMs may provide some essential information for better management of polluted soils to restrict the HMs entrance to the human food chain. This paper explores the applicability of multivariate statistical techniques in the identification of probable sources that can control the concentration and distribution of selected HMs in the soils surrounding the Zanjan Zinc Specialized Industrial Town (briefly Zinc Town. Materials and Methods: The area under investigation has a size of approximately 4000 ha.It is located around the Zinc Town, Zanjan province. A regular grid sampling pattern with an interval of 500 meters was applied to identify the sample location, and 184 topsoil samples (0-10 cm were collected. The soil samples were air-dried and sieved through a 2 mm polyethylene sieve and then, were digested using HNO3. The total concentrations of zinc (Zn, lead (Pb, cadmium (Cd, Nickel (Ni and copper (Cu in the soil solutions were determined via Atomic Absorption Spectroscopy (AAS. Data were statistically analyzed using the SPSS software version 17.0 for Windows. Correlation Matrix (CM, Principal Component Analyses (PCA and Factor Analyses (FA techniques were performed in order to identify the probable sources of HMs in the studied soils. Results and
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-01-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0–0.10 m, or 0–0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component
Ahmed, Fahad; Fakhruddin, A. N. M.; Imam, MD. Toufick; Khan, Nasima; Abdullah, Abu Tareq Mohammad; Khan, Tanzir Ahmed; Rahman, Md. Mahfuzur; Uddin, Mohammad Nashir
2017-11-01
In this study, multivariate statistical techniques in collaboration with GIS are used to assess the roadside surface water quality of Savar region. Nineteen water samples were collected in dry season and 15 water quality parameters including TSS, TDS, pH, DO, BOD, Cl-, F-, NO3 2-, NO2 -, SO4 2-, Ca, Mg, K, Zn and Pb were measured. The univariate overview of water quality parameters are TSS 25.154 ± 8.674 mg/l, TDS 840.400 ± 311.081 mg/l, pH 7.574 ± 0.256 pH unit, DO 4.544 ± 0.933 mg/l, BOD 0.758 ± 0.179 mg/l, Cl- 51.494 ± 28.095 mg/l, F- 0.771 ± 0.153 mg/l, NO3 2- 2.211 ± 0.878 mg/l, NO2 - 4.692 ± 5.971 mg/l, SO4 2- 69.545 ± 53.873 mg/l, Ca 48.458 ± 22.690 mg/l, Mg 19.676 ± 7.361 mg/l, K 12.874 ± 11.382 mg/l, Zn 0.027 ± 0.029 mg/l, Pb 0.096 ± 0.154 mg/l. The water quality data were subjected to R-mode PCA which resulted in five major components. PC1 explains 28% of total variance and indicates the roadside and brick field dust settle down (TDS, TSS) in the nearby water body. PC2 explains 22.123% of total variance and indicates the agricultural influence (K, Ca, and NO2 -). PC3 describes the contribution of nonpoint pollution from agricultural and soil erosion processes (SO4 2-, Cl-, and K). PC4 depicts heavy positively loaded by vehicle emission and diffusion from battery stores (Zn, Pb). PC5 depicts strong positive loading of BOD and strong negative loading of pH. Cluster analysis represents three major clusters for both water parameters and sampling sites. The site based on cluster showed similar grouping pattern of R-mode factor score map. The present work reveals a new scope to monitor the roadside water quality for future research in Bangladesh.
Study of groundwater arsenic pollution in Lanyang Plain using multivariate statistical analysis
chan, S.
2013-12-01
The study area, Lanyang Plain in the eastern Taiwan, has highly developed agriculture and aquaculture, which consume over 70% of the water supplies. Groundwater is frequently considered as an alternative water source. However, the serious arsenic pollution of groundwater in Lanyan Plain should be well studied to ensure the safety of groundwater usage. In this study, 39 groundwater samples were collected. The results of hydrochemistry demonstrate two major trends in Piper diagram. The major trend with most of groundwater samples is determined with water type between Ca+Mg-HCO3 and Na+K-HCO3. This can be explained with cation exchange reaction. The minor trend is obviously corresponding to seawater intrusion, which has water type of Na+K-Cl, because the localities of these samples are all in the coastal area. The multivariate statistical analysis on hydrochemical data was conducted for further exploration on the mechanism of arsenic contamination. Two major factors can be extracted with factor analysis. The major factor includes Ca, Mg and Sr while the minor factor includes Na, K and As. This reconfirms that cation exchange reaction mainly control the groundwater hydrochemistry in the study area. It is worth to note that arsenic is positively related to Na and K. The result of cluster analysis shows that groundwater samples with high arsenic concentration can be grouped into that with high Na, K and HCO3. This supports that cation exchange would enhance the release of arsenic and exclude the effect of seawater intrusion. In other words, the water-rock reaction time is key to obtain higher arsenic content. In general, the major source of arsenic in sediments include exchangeable, reducible and oxidizable phases, which are adsorbed ions, Fe-Mn oxides and organic matters/pyrite, respectively. However, the results of factor analysis do not show apparent correlation between arsenic and Fe/Mn. This may exclude Fe-Mn oxides as a major source of arsenic. The other sources
Banoeng-Yakubo, B.; Yidana, S.M.; Nti, E.
2009-01-01
Q and R-mode multivariate statistical analyses were applied to groundwater chemical data from boreholes and wells in the northern section of the Volta region Ghana. The objective was to determine the processes that affect the hydrochemistry and the variation of these processes in space among the three main geological terrains: the Buem formation, Voltaian System and the Togo series that underlie the area. The analyses revealed three zones in the groundwater flow system: recharge, intermediate and discharge regions. All three zones are clearly different with respect to all the major chemical parameters, with concentrations increasing from the perceived recharge areas through the intermediate regions to the discharge areas. R-mode HCA and factor analysis (using varimax rotation and Kaiser Criterion) were then applied to determine the significant sources of variation in the hydrochemistry. This study finds that groundwater hydrochemistry in the area is controlled by the weathering of silicate and carbonate minerals, as well as the chemistry of infiltrating precipitation. This study finds that the ??D and ??18O data from the area fall along the Global Meteoric Water Line (GMWL). An equation of regression derived for the relationship between ??D and ??18O bears very close semblance to the equation which describes the GMWL. On the basis of this, groundwater in the study area is probably meteoric and fresh. The apparently low salinities and sodicities of the groundwater seem to support this interpretation. The suitability of groundwater for domestic and irrigation purposes is related to its source, which determines its constitution. A plot of the sodium adsorption ratio (SAR) and salinity (EC) data on a semilog axis, suggests that groundwater serves good irrigation quality in the area. Sixty percent (60%), 20% and 20% of the 67 data points used in this study fall within the medium salinity - low sodicity (C2-S1), low salinity -low sodicity (C1-S1) and high salinity - low
Xiong, Haoshu; Yu, Lawrence X; Qu, Haibin
2013-06-01
Botanical drug products have batch-to-batch quality variability due to botanical raw materials and the current manufacturing process. The rational evaluation and control of product quality consistency are essential to ensure the efficacy and safety. Chromatographic fingerprinting is an important and widely used tool to characterize the chemical composition of botanical drug products. Multivariate statistical analysis has showed its efficacy and applicability in the quality evaluation of many kinds of industrial products. In this paper, the combined use of multivariate statistical analysis and chromatographic fingerprinting is presented here to evaluate batch-to-batch quality consistency of botanical drug products. A typical botanical drug product in China, Shenmai injection, was selected as the example to demonstrate the feasibility of this approach. The high-performance liquid chromatographic fingerprint data of historical batches were collected from a traditional Chinese medicine manufacturing factory. Characteristic peaks were weighted by their variability among production batches. A principal component analysis model was established after outliers were modified or removed. Multivariate (Hotelling T(2) and DModX) control charts were finally successfully applied to evaluate the quality consistency. The results suggest useful applications for a combination of multivariate statistical analysis with chromatographic fingerprinting in batch-to-batch quality consistency evaluation for the manufacture of botanical drug products.
McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S
2017-12-01
Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.
Subset Statistics in the linear IV regression model
Kleibergen, F.R.
2005-01-01
We show that the limiting distributions of subset generalizations of the weak instrument robust instrumental variable statistics are boundedly similar when the remaining structural parameters are estimated using maximum likelihood. They are bounded from above by the limiting distributions which
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Common pitfalls in statistical analysis: Linear regression analysis
Rakesh Aggarwal
2017-01-01
Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Jayalakshmy, K.V.; Rao, K.K.
Harbour, En- gland: a reappraisal using multivariate tech- niques. J. Paleontol., 43 (3) : 660-675. Imbrie, J. and F.B. Phleger. 1963. Analisis por vectores de los foraminiferos bentonicos del area de San Diego, California. Soc. Geol. Mex., Bol., 26...
Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome. Chave
2014-01-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis
Đula Borozan
2014-03-01
Full Text Available The paper deals with the application of multivariate analysis of variance and logistic regression in measuring, explaining and evaluating (i gender differences in expressing migration aspirations, and (ii a gender effect on migration motivation of university students in Croatia. The results supported the thesis that migration is a complex gendering process that assumes subjective assessment of the whole set of interrelated motives. According to logistic regression, gender is a significant predictor of migration aspirations among the selected demographic and socio-economic variables. A multivariate analysis of variance showed that gender and migration aspirations in interaction matter when it comes to migration motives, particularly related to the perceived importance of social networks. Females, and especially those who aspire to migrate, assessed these motives as more important than males.
Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat
2009-01-01
Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Multivariate-Statistical Assessment of Heavy Metals for Agricultural Soils in Northern China
Yang, Pingguo; Yang, Miao; Mao, Renzhao; Shao, Hongbo
2014-01-01
The study evaluated eight heavy metals content and soil pollution from agricultural soils in northern China. Multivariate and geostatistical analysis approaches were used to determine the anthropogenic and natural contribution of soil heavy metal concentrations. Single pollution index and integrated pollution index could be used to evaluate soil heavy metal risk. The results show that the first factor explains 27.3% of the eight soil heavy metals with strong positive loadings on Cu, Zn, and C...
Xiong, Haoshu; Yu, Lawrence X.; Qu, Haibin
2013-01-01
Botanical drug products have batch-to-batch quality variability due to botanical raw materials and the current manufacturing process. The rational evaluation and control of product quality consistency are essential to ensure the efficacy and safety. Chromatographic fingerprinting is an important and widely used tool to characterize the chemical composition of botanical drug products. Multivariate statistical analysis has showed its efficacy and applicability in the quality evaluation of many ...
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Huffman and linear scanning methods with statistical language models.
Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris
2015-03-01
Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.
Raymond, Ogbuka Obinna
2017-01-01
In analyzing survey data, most researchers and analysts make use of statistical methods with straight forward statistical approaches. More common, is the use of one‐way, two‐way or multi‐way tables, and graphical displays such as bar charts, line charts, etc. A brief overview of these approaches and a good discussion on aspects needing attention during the data analysis process can be found in Wilson & Stern (2001). In most cases however, analysis procedures that go beyond simp...
Fault detection of a spur gear using vibration signal with multivariable statistical parameters
Songpon Klinchaeam
2014-10-01
Full Text Available This paper presents a condition monitoring technique of a spur gear fault detection using vibration signal analysis based on time domain. Vibration signals were acquired from gearboxes and used to simulate various faults on spur gear tooth. In this study, vibration signals were applied to monitor a normal and various fault conditions of a spur gear such as normal, scuffing defect, crack defect and broken tooth. The statistical parameters of vibration signal were used to compare and evaluate the value of fault condition. This technique can be applied to set alarm limit of the signal condition based on statistical parameter such as variance, kurtosis, rms and crest factor. These parameters can be used to set as a boundary decision of signal condition. From the results, the vibration signal analysis with single statistical parameter is unclear to predict fault of the spur gears. The using at least two statistical parameters can be clearly used to separate in every case of fault detection. The boundary decision of statistical parameter with the 99.7% certainty ( 3 from 300 referenced dataset and detected the testing condition with 99.7% ( 3 accuracy and had an error of less than 0.3 % using 50 testing dataset.
Ghanbari, Y.; Habibnia, A.; Memar, A.
2009-01-01
In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample
Fuchs, Julia; Cermak, Jan; Andersen, Hendrik
2017-04-01
This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.
Xu, Qinzeng; Gao, Fei; Xu, Qiang; Yang, Hongsheng
2014-11-01
Fatty acids (FAs) provide energy and also can be used to trace trophic relationships among organisms. Sea cucumber Apostichopus japonicus goes into a state of aestivation during warm summer months. We examined fatty acid profiles in aestivated and non-aestivated A. japonicus using multivariate analyses (PERMANOVA, MDS, ANOSIM, and SIMPER). The results indicate that the fatty acid profiles of aestivated and non-aestivated sea cucumbers differed significantly. The FAs that were produced by bacteria and brown kelp contributed the most to the differences in the fatty acid composition of aestivated and nonaestivated sea cucumbers. Aestivated sea cucumbers may synthesize FAs from heterotrophic bacteria during early aestivation, and long chain FAs such as eicosapentaenoic (EPA) and docosahexaenoic acid (DHA) that produced from intestinal degradation, are digested during deep aestivation. Specific changes in the fatty acid composition of A. japonicus during aestivation needs more detailed study in the future.
Mavukkandy, Musthafa Odayooth; Karmakar, Subhankar; Harikumar, P S
2014-09-01
The establishment of an efficient surface water quality monitoring (WQM) network is a critical component in the assessment, restoration and protection of river water quality. A periodic evaluation of monitoring network is mandatory to ensure effective data collection and possible redesigning of existing network in a river catchment. In this study, the efficacy and appropriateness of existing water quality monitoring network in the Kabbini River basin of Kerala, India is presented. Significant multivariate statistical techniques like principal component analysis (PCA) and principal factor analysis (PFA) have been employed to evaluate the efficiency of the surface water quality monitoring network with monitoring stations as the evaluated variables for the interpretation of complex data matrix of the river basin. The main objective is to identify significant monitoring stations that must essentially be included in assessing annual and seasonal variations of river water quality. Moreover, the significance of seasonal redesign of the monitoring network was also investigated to capture valuable information on water quality from the network. Results identified few monitoring stations as insignificant in explaining the annual variance of the dataset. Moreover, the seasonal redesign of the monitoring network through a multivariate statistical framework was found to capture valuable information from the system, thus making the network more efficient. Cluster analysis (CA) classified the sampling sites into different groups based on similarity in water quality characteristics. The PCA/PFA identified significant latent factors standing for different pollution sources such as organic pollution, industrial pollution, diffuse pollution and faecal contamination. Thus, the present study illustrates that various multivariate statistical techniques can be effectively employed in sustainable management of water resources. The effectiveness of existing river water quality monitoring
Lopez I, J. F.; Rios M, C.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J.L.
2017-09-01
The environmental radioactivity evaluation is a key point in the assessment of the environmental quality. Through this, it can be found possible radioactive contamination, locate possible Uranium and Thorium deposits and evaluate the primordial isotopes concentration due to human activities. A radioactive map of the Zacatecas State, Mexico is under construction based on in situ gamma-ray spectrometry. The present work reports the results of the multivariate statistical approximation of the measured activity data. Based on Pearson correlation, the 228 Ac and 208 Tl activities are statistically significant, while the 214 Bi and 214 Pb activities are not statistically significant. These can be due to the existence or not of secular equilibrium in the Thorium and Uranium series. (Author)
Lopez I, J. F.; Rios M, C.; Mireles G, F.; Saucedo A, S.; Davila R, I.; Pinedo, J.L., E-mail: fernandolf498@gmail.com [Universidad Autonoma de Zacatecas, Unidad Academica de Estudios Nucleares, Cipres No. 10, Fracc. La Penuela, 98060 Zacatecas, Zac. (Mexico)
2017-09-15
The environmental radioactivity evaluation is a key point in the assessment of the environmental quality. Through this, it can be found possible radioactive contamination, locate possible Uranium and Thorium deposits and evaluate the primordial isotopes concentration due to human activities. A radioactive map of the Zacatecas State, Mexico is under construction based on in situ gamma-ray spectrometry. The present work reports the results of the multivariate statistical approximation of the measured activity data. Based on Pearson correlation, the {sup 228}Ac and {sup 208}Tl activities are statistically significant, while the {sup 214}Bi and {sup 214}Pb activities are not statistically significant. These can be due to the existence or not of secular equilibrium in the Thorium and Uranium series. (Author)
Xu, Min; Zhang, Lei; Yue, Hong-Shui; Pang, Hong-Wei; Ye, Zheng-Liang; Ding, Li
2017-10-01
To establish an on-line monitoring method for extraction process of Schisandrae Chinensis Fructus, the formula medicinal material of Yiqi Fumai lyophilized injection by combining near infrared spectroscopy with multi-variable data analysis technology. The multivariate statistical process control (MSPC) model was established based on 5 normal batches in production and 2 test batches were monitored by PC scores, DModX and Hotelling T2 control charts. The results showed that MSPC model had a good monitoring ability for the extraction process. The application of the MSPC model to actual production process could effectively achieve on-line monitoring for extraction process of Schisandrae Chinensis Fructus, and can reflect the change of material properties in the production process in real time. This established process monitoring method could provide reference for the application of process analysis technology in the process quality control of traditional Chinese medicine injections. Copyright© by the Chinese Pharmaceutical Association.
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
An Application of Multivariate Statistical Analysis for Query-Driven Visualization
Gosink, Luke J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Garth, Christoph [Univ. of California, Davis, CA (United States); Anderson, John C. [Univ. of California, Davis, CA (United States); Bethel, E. Wes [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Joy, Kenneth I. [Univ. of California, Davis, CA (United States)
2011-03-01
Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.
Carvajal Escobar Yesid; Munoz, Flor Matilde
2007-01-01
The project this centred in the revision of the state of the art of the ocean-atmospheric phenomena that you affect the Colombian hydrology especially The Phenomenon Enos that causes a socioeconomic impact of first order in our country, it has not been sufficiently studied; therefore it is important to approach the thematic one, including the variable macroclimates associated to the Enos in the analyses of water planning. The analyses include revision of statistical techniques of analysis of consistency of hydrological data with the objective of conforming a database of monthly flow of the river reliable and homogeneous Cauca. Statistical methods are used (Analysis of data multivariante) specifically The analysis of principal components to involve them in the development of models of prediction of flows monthly means in the river Cauca involving the Lineal focus as they are the model autoregressive AR, ARX and Armax and the focus non lineal Net Artificial Network.
Bootstrap-based confidence estimation in PCA and multivariate statistical process control
Babamoradi, Hamid
be used to detect outliers in the data since the outliers can distort the bootstrap estimates. Bootstrap-based confidence limits were suggested as alternative to the asymptotic limits for control charts and contribution plots in MSPC (Paper II). The results showed that in case of the Q-statistic......Traditional/Asymptotic confidence estimation has limited applicability since it needs statistical theories to estimate the confidences, which are not available for all indicators/parameters. Furthermore, in case the theories are available for a specific indicator/parameter, the theories are based....... The goal was to improve process monitoring by improving the quality of MSPC charts and contribution plots. Bootstrapping algorithm to build confidence limits was illustrated in a case study format (Paper I). The main steps in the algorithm were discussed where a set of sensible choices (plus...
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
Peng Nai
2016-03-01
Full Text Available A great number of immigration populations resident permanently in Yunnan Border Area of China. To some extent, these people belong to refugees or immigrants in accordance with International Rules, which significantly features the social diversity of this area. However, this kind of social diversity always impairs the social order. Therefore, there will be a positive influence to the local society governance by a research on local immigration integration. This essay hereby attempts to acquire the data of the living situation of these border area immigration and refugees. The analysis of the social integration of refugees and immigration in Yunnan border area in China will be deployed through the modeling of multivariable linear regression based on these data in order to propose some more achievable resolutions.
Malik, Riffat Naseem; Hashmi, Muhammad Zaffar
2017-10-01
Himalayan foothills streams, Pakistan play an important role in living water supply and irrigation of farmlands; thus, the water quality is closely related to public health. Multivariate techniques were applied to check spatial and seasonal trends, and metals contamination sources of the Himalayan foothills streams, Pakistan. Grab surface water samples were collected from different sites (5-15 cm water depth) in pre-washed polyethylene containers. Fast Sequential Atomic Absorption Spectrophotometer (Varian FSAA-240) was used to measure the metals concentration. Concentrations of Ni, Cu, and Mn were high in pre-monsoon season than the post-monsoon season. Cluster analysis identified impaired, moderately impaired and least impaired clusters based on water parameters. Discriminant function analysis indicated spatial variability in water was due to temperature, electrical conductivity, nitrates, iron and lead whereas seasonal variations were correlated with 16 physicochemical parameters. Factor analysis identified municipal and poultry waste, automobile activities, surface runoff, and soil weathering as major sources of contamination. Levels of Mn, Cr, Fe, Pb, Cd, Zn and alkalinity were above the WHO and USEPA standards for surface water. The results of present study will help to higher authorities for the management of the Himalayan foothills streams.
Birch, Thomas; Martinón-Torres, Marcos
2015-01-01
An assemblage of post-medieval iron bars was found with the Princes Channel wreck, salvaged from the Thames Estuary in 2003. They were recorded and studied, with a focus on metallography and slag inclusion analysis. The investigation provided an opportunity to explore the use of multivariate...... statistical techniques to analyse slag inclusion data. Cluster analysis supplemented by principal components analysis revealed two groups of iron, probably originating from different smelting systems, which were compared to those observed macroscopically and through metallography. The analyses reveal...
Samanta, P.K.; Teichmann, T.
1990-01-01
In this paper, a multivariate statistical method is presented and demonstrated as a means for analyzing nuclear power plant transients (or events) and safety system performance for detection of malfunctions and degradations within the course of the event based on operational data. The study provides the methodology and illustrative examples based on data gathered from simulation of nuclear power plant transients (due to lack of easily accessible operational data). Such an approach, once fully developed, can be used to detect failure trends and patterns and so can lead to prevention of conditions with serious safety implications
Non-linearity consideration when analyzing reactor noise statistical characteristics. [BWR
Kebadze, B V; Adamovski, L A
1975-06-01
Statistical characteristics of boiling water reactor noise in the vicinity of stability threshold are studied. The reactor is considered as a non-linear system affected by random perturbations. To solve a non-linear problem the principle of statistical linearization is used. It is shown that the halfwidth of resonance peak in neutron power noise spectrum density as well as the reciprocal of noise dispersion, which are used in predicting a stable operation theshold, are different from zero both within and beyond the stability boundary the determination of which was based on linear criteria.
Dhat, Shalaka; Pund, Swati; Kokare, Chandrakant; Sharma, Pankaj; Shrivastava, Birendra
2017-01-01
Rapidly evolving technical and regulatory landscapes of the pharmaceutical product development necessitates risk management with application of multivariate analysis using Process Analytical Technology (PAT) and Quality by Design (QbD). Poorly soluble, high dose drug, Satranidazole was optimally nanoprecipitated (SAT-NP) employing principles of Formulation by Design (FbD). The potential risk factors influencing the critical quality attributes (CQA) of SAT-NP were identified using Ishikawa diagram. Plackett-Burman screening design was adopted to screen the eight critical formulation and process parameters influencing the mean particle size, zeta potential and dissolution efficiency at 30min in pH7.4 dissolution medium. Pareto charts (individual and cumulative) revealed three most critical factors influencing CQA of SAT-NP viz. aqueous stabilizer (Polyvinyl alcohol), release modifier (Eudragit® S 100) and volume of aqueous phase. The levels of these three critical formulation attributes were optimized by FbD within established design space to minimize mean particle size, poly dispersity index, and maximize encapsulation efficiency of SAT-NP. Lenth's and Bayesian analysis along with mathematical modeling of results allowed identification and quantification of critical formulation attributes significantly active on the selected CQAs. The optimized SAT-NP exhibited mean particle size; 216nm, polydispersity index; 0.250, zeta potential; -3.75mV and encapsulation efficiency; 78.3%. The product was lyophilized using mannitol to form readily redispersible powder. X-ray diffraction analysis confirmed the conversion of crystalline SAT to amorphous form. In vitro release of SAT-NP in gradually pH changing media showed 95%) in pH7.4 in next 3h, indicative of burst release after a lag time. This investigation demonstrated effective application of risk management and QbD tools in developing site-specific release SAT-NP by nanoprecipitation. Copyright © 2016 Elsevier B.V. All
Kallache, M.
2012-04-01
Droughts cause important losses. On the Iberian Peninsula, for example, non-irrigated agriculture and the tourism sector are affected in regular intervals. The goal of this study is the description of droughts and their dependence in the Duero basin in Central Spain. To do so, daily or monthly precipitation data is used. Here cumulative precipitation deficits below a threshold define meteorological droughts. This drought indicator is similar to the commonly used standard precipitation index. However, here the focus lies on the modeling of severe droughts, which is done by applying multivariate extreme value theory (MEVT) to model extreme drought events. Data from several stations are assessed jointly, thus the uncertainty of the results is reduced. Droughts are a complex phenomenon, their severity, spatial extension and duration has to be taken into account. Our approach captures severity and spatial extension. In general we find a high correlation between deficit volumes and drought duration, thus the duration is not explicitely modeled. We apply a MEVT model with asymmetric logistic dependence function, which is capable to model asymptotic dependence and independence (cf. Ramos and Ledford, 2009). To summarize the information on the dependence in the joint tail of the extreme drought events, we utilise the fragility index (Geluk et al., 2007). Results show that droughts also occur frequently in winter. Moreover, it is very common for one site to suffer dry conditions, whilst neighboring areas experience normal or even humid conditions. Interpolation is thus difficult. Bivariate extremal dependence is present in the data. However, most stations are at least asymptotically independent. The according fragility indices are important information for risk calculations. The emerging spatial patterns for bivariate dependence are mostly influenced by topography. When looking at the dependence between more than two stations, it shows that joint extremes can occur more
Multispectral x-ray CT: multivariate statistical analysis for efficient reconstruction
Kheirabadi, Mina; Mustafa, Wail; Lyksborg, Mark; Lund Olsen, Ulrik; Bjorholm Dahl, Anders
2017-10-01
Recent developments in multispectral X-ray detectors allow for an efficient identification of materials based on their chemical composition. This has a range of applications including security inspection, which is our motivation. In this paper, we analyze data from a tomographic setup employing the MultiX detector, that records projection data in 128 energy bins covering the range from 20 to 160 keV. Obtaining all information from this data requires reconstructing 128 tomograms, which is computationally expensive. Instead, we propose to reduce the dimensionality of projection data prior to reconstruction and reconstruct from the reduced data. We analyze three linear methods for dimensionality reduction using a dataset with 37 equally-spaced projection angles. Four bottles with different materials are recorded for which we are able to obtain similar discrimination of their content using a very reduced subset of tomograms compared to the 128 tomograms that would otherwise be needed without dimensionality reduction.
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a hallucinogenic herb that is internationally regulated. In this study, salvinorin A, the active compound in S. divinorum, was extracted from S. divinorum plant leaves using a 5-min extraction with dichloromethane. Four additional Salvia species (Salvia officinalis, Salvia guaranitica, Salvia splendens, and Salvia nemorosa) were extracted using this procedure, and all extracts were analyzed by gas chromatography-mass spectrometry. Differentiation of S. divinorum from other Salvia species was successful based on visual assessment of the resulting chromatograms. To provide a more objective comparison, the total ion chromatograms (TICs) were subjected to principal components analysis (PCA). Prior to PCA, the TICs were subjected to a series of data pretreatment procedures to minimize non-chemical sources of variance in the data set. Successful discrimination of S. divinorum from the other four Salvia species was possible based on visual assessment of the PCA scores plot. To provide a numerical assessment of the discrimination, a series of statistical procedures such as Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores. The statistical procedures were then compared to determine the advantages and disadvantages for forensic applications.
2012-01-01
This report evaluates the chemistry of seep water occurring in three desert drainages near Shiprock, New Mexico: Many Devils Wash, Salt Creek Wash, and Eagle Nest Arroyo. Through the use of geochemical plotting tools and multivariate statistical analysis techniques, analytical results of samples collected from the three drainages are compared with the groundwater chemistry at a former uranium mill in the Shiprock area (the Shiprock site), managed by the U.S. Department of Energy Office of Legacy Management. The objective of this study was to determine, based on the water chemistry of the samples, if statistically significant patterns or groupings are apparent between the sample populations and, if so, whether there are any reasonable explanations for those groupings.
None, None
2012-12-31
This report evaluates the chemistry of seep water occurring in three desert drainages near Shiprock, New Mexico: Many Devils Wash, Salt Creek Wash, and Eagle Nest Arroyo. Through the use of geochemical plotting tools and multivariate statistical analysis techniques, analytical results of samples collected from the three drainages are compared with the groundwater chemistry at a former uranium mill in the Shiprock area (the Shiprock site), managed by the U.S. Department of Energy Office of Legacy Management. The objective of this study was to determine, based on the water chemistry of the samples, if statistically significant patterns or groupings are apparent between the sample populations and, if so, whether there are any reasonable explanations for those groupings.
Nosrati, Kazem
2013-04-01
Soil degradation associated with soil erosion and land use is a critical problem in Iran and there is little or insufficient scientific information in assessing soil quality indicator. In this study, factor analysis (FA) and discriminant analysis (DA) were used to identify the most sensitive indicators of soil quality for evaluating land use and soil erosion within the Hiv catchment in Iran and subsequently compare soil quality assessment using expert opinion based on soil surface factors (SSF) form of Bureau of Land Management (BLM) method. Therefore, 19 soil physical, chemical, and biochemical properties were measured from 56 different sampling sites covering three land use/soil erosion categories (rangeland/surface erosion, orchard/surface erosion, and rangeland/stream bank erosion). FA identified four factors that explained for 82 % of the variation in soil properties. Three factors showed significant differences among the three land use/soil erosion categories. The results indicated that based upon backward-mode DA, dehydrogenase, silt, and manganese allowed more than 80 % of the samples to be correctly assigned to their land use and erosional status. Canonical scores of discriminant functions were significantly correlated to the six soil surface indices derived of BLM method. Stepwise linear regression revealed that soil surface indices: soil movement, surface litter, pedestalling, and sum of SSF were also positively related to the dehydrogenase and silt. This suggests that dehydrogenase and silt are most sensitive to land use and soil erosion.
Birol Elevli
2012-12-01
Full Text Available Rock fragmentation is considered to be one of the most important aspects of quarrying because of its direct effect on the costsof drilling, which include blasting, loading, hauling and crushing. Thus, it is essential to consider fragmentation size in blasting design.Fragmentation depends on many variables, such as rock properties, geological structures, and blasting parameters. Although empiricalmodels for the estimation of the size distribution of rock fragmentation have been developed by considering these parameters,no complete empirical prediction model for fragmentation exists since rock properties and geological structures vary from site to site.However, these models regard rock properties as constant. In this study, a step–wise multiple linear regression analysis has beencarried out to determine the degree of dominance of various influencing parameters on fragmentation and to develop a fragmentationprediction model. The results showed that the rock mass properties, burden width and specific charge are the main parameters affectingfragmentation. The relations among those parameters were used to develop guideline charts to determine blast layouts for desiredfragmentation on the basis of rock characteristics.
Chen-Lin Soo
2017-01-01
Full Text Available The study on Sarawak coastal water quality is scarce, not to mention the application of the multivariate statistical approach to investigate the spatial variation of water quality and to identify the pollution source in Sarawak coastal water. Hence, the present study aimed to evaluate the spatial variation of water quality along the coastline of the southwestern region of Sarawak using multivariate statistical techniques. Seventeen physicochemical parameters were measured at 11 stations along the coastline with approximately 225 km length. The coastal water quality showed spatial heterogeneity where the cluster analysis grouped the 11 stations into four different clusters. Deterioration in coastal water quality has been observed in different regions of Sarawak corresponding to land use patterns in the region. Nevertheless, nitrate-nitrogen exceeded the guideline value at all sampling stations along the coastline. The principal component analysis (PCA has determined a reduced number of five principal components that explained 89.0% of the data set variance. The first PC indicated that the nutrients were the dominant polluting factors, which is attributed to the domestic, agricultural, and aquaculture activities, followed by the suspended solids in the second PC which are related to the logging activities.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
Soto, R; Wu, Ch. H; Bubela, A M
1999-01-01
This work introduces a novel methodology to improve reservoir characterization models. In this methodology we integrated multivariate statistical analyses, and neural network models for forecasting the infill drilling ultimate oil recovery from reservoirs in San Andres and Clearfork carbonate formations in west Texas. Development of the oil recovery forecast models help us to understand the relative importance of dominant reservoir characteristics and operational variables, reproduce recoveries for units included in the database, forecast recoveries for possible new units in similar geological setting, and make operational (infill drilling) decisions. The variety of applications demands the creation of multiple recovery forecast models. We have developed intelligent software (Soto, 1998), oilfield intelligence (01), as an engineering tool to improve the characterization of oil and gas reservoirs. 01 integrates neural networks and multivariate statistical analysis. It is composed of five main subsystems: data input, preprocessing, architecture design, graphic design, and inference engine modules. One of the challenges in this research was to identify the dominant and the optimum number of independent variables. The variables include porosity, permeability, water saturation, depth, area, net thickness, gross thickness, formation volume factor, pressure, viscosity, API gravity, number of wells in initial water flooding, number of wells for primary recovery, number of infill wells over the initial water flooding, PRUR, IWUR, and IDUR. Multivariate principal component analysis is used to identify the dominant and the optimum number of independent variables. We compared the results from neural network models with the non-parametric approach. The advantage of the non-parametric regression is that it is easy to use. The disadvantage is that it retains a large variance of forecast results for a particular data set. We also used neural network concepts to develop recovery
Xiangyu Mu
2014-09-01
Full Text Available Natural factors and anthropogenic activities both contribute dissolved chemical loads to lakes and streams. Mineral solubility, geomorphology of the drainage basin, source strengths and climate all contribute to concentrations and their variability. Urbanization and agriculture waste-water particularly lead to aquatic environmental degradation. Major contaminant sources and controls on water quality can be asssessed by analyzing the variability in proportions of major and minor solutes in water coupled to mutivariate statistical methods. The demand for freshwater needed for increasing crop production puulation and industrialization occurs almost everywhere in in China and these conflicting needs have led to widespread water contamination. Because of heavy nutrient loadings from all of these sources, Lake Taihu (eastern China notably suffers periodic hyper-eutrophication and drinking water deterioration, which has led to shortages of freshwater for the City of Wuxi and other nearby cities. This lake, the third largest freshwater body in China, has historically beeen considered a cultural treasure of China, and has supported long-term fisheries. The is increasing pressure to remediate the present contamination which compromises both aquiculture and the prior economic base centered on tourism. However, remediation cannot be effectively done without first characterizing the broad nature of the non-point source pollution. To this end, we investigated the hydrochemical setting of Lake Taihu to determine how different land use types influence the variability of surface water chemistry in different water sources to the lake. We found that waters broadly show wide variability ranging from calcium-magnesium-bicarbonate hydrochemical facies type to mixed sodium-sulfate-chloride type. Principal components analysis produced three principal components that explained 78% of the variance in the water quality and reflect three major types of water
Damage detection of engine bladed-disks using multivariate statistical analysis
Fang, X.; Tang, J.
2006-03-01
The timely detection of damage in aero-engine bladed-disks is an extremely important and challenging research topic. Bladed-disks have high modal density and, particularly, their vibration responses are subject to significant uncertainties due to manufacturing tolerance (blade-to-blade difference or mistuning), operating condition change and sensor noise. In this study, we present a new methodology for the on-line damage detection of engine bladed-disks using their vibratory responses during spin-up or spin-down operations which can be measured by blade-tip-timing sensing technique. We apply a principle component analysis (PCA)-based approach for data compression, feature extraction, and denoising. The non-model based damage detection is achieved by analyzing the change between response features of the healthy structure and of the damaged one. We facilitate such comparison by incorporating the Hotelling's statistic T2 analysis, which yields damage declaration with a given confidence level. The effectiveness of the method is demonstrated by case studies.
Tien, Hai Minh; Le, Kien Anh; Le, Phung Thi Kim
2017-09-01
Bio hydrogen is a sustainable energy resource due to its potentially higher efficiency of conversion to usable power, high energy efficiency and non-polluting nature resource. In this work, the experiments have been carried out to indicate the possibility of generating bio hydrogen as well as identifying effective factors and the optimum conditions from cassava starch. Experimental design was used to investigate the effect of operating temperature (37-43 °C), pH (6-7), and inoculums ratio (6-10 %) to the yield hydrogen production, the COD reduction and the ratio of volume of hydrogen production to COD reduction. The statistical analysis of the experiment indicated that the significant effects for the fermentation yield were the main effect of temperature, pH and inoculums ratio. The interaction effects between them seem not significant. The central composite design showed that the polynomial regression models were in good agreement with the experimental results. This result will be applied to enhance the process of cassava starch processing wastewater treatment.
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
Park, Jinyong; Balasingham, P.; McKenna, Sean Andrew; Kulatilake, Pinnaduwa H. S. W.
2004-01-01
Sandia National Laboratories, under contract to Nuclear Waste Management Organization of Japan (NUMO), is performing research on regional classification of given sites in Japan with respect to potential volcanic disruption using multivariate statistics and geo-statistical interpolation techniques. This report provides results obtained for hierarchical probabilistic regionalization of volcanism for the Sengan region in Japan by applying multivariate statistical techniques and geostatistical interpolation techniques on the geologic data provided by NUMO. A workshop report produced in September 2003 by Sandia National Laboratories (Arnold et al., 2003) on volcanism lists a set of most important geologic variables as well as some secondary information related to volcanism. Geologic data extracted for the Sengan region in Japan from the data provided by NUMO revealed that data are not available at the same locations for all the important geologic variables. In other words, the geologic variable vectors were found to be incomplete spatially. However, it is necessary to have complete geologic variable vectors to perform multivariate statistical analyses. As a first step towards constructing complete geologic variable vectors, the Universal Transverse Mercator (UTM) zone 54 projected coordinate system and a 1 km square regular grid system were selected. The data available for each geologic variable on a geographic coordinate system were transferred to the aforementioned grid system. Also the recorded data on volcanic activity for Sengan region were produced on the same grid system. Each geologic variable map was compared with the recorded volcanic activity map to determine the geologic variables that are most important for volcanism. In the regionalized classification procedure, this step is known as the variable selection step. The following variables were determined as most important for volcanism: geothermal gradient, groundwater temperature, heat discharge, groundwater
Åberg Lindell, M.; Andersson, P.; Grape, S.; Hellesen, C.; Håkansson, A.; Thulin, M.
2018-03-01
This paper investigates how concentrations of certain fission products and their related gamma-ray emissions can be used to discriminate between uranium oxide (UOX) and mixed oxide (MOX) type fuel. Discrimination of irradiated MOX fuel from irradiated UOX fuel is important in nuclear facilities and for transport of nuclear fuel, for purposes of both criticality safety and nuclear safeguards. Although facility operators keep records on the identity and properties of each fuel, tools for nuclear safeguards inspectors that enable independent verification of the fuel are critical in the recovery of continuity of knowledge, should it be lost. A discrimination methodology for classification of UOX and MOX fuel, based on passive gamma-ray spectroscopy data and multivariate analysis methods, is presented. Nuclear fuels and their gamma-ray emissions were simulated in the Monte Carlo code Serpent, and the resulting data was used as input to train seven different multivariate classification techniques. The trained classifiers were subsequently implemented and evaluated with respect to their capabilities to correctly predict the classes of unknown fuel items. The best results concerning successful discrimination of UOX and MOX-fuel were acquired when using non-linear classification techniques, such as the k nearest neighbors method and the Gaussian kernel support vector machine. For fuel with cooling times up to 20 years, when it is considered that gamma-rays from the isotope 134Cs can still be efficiently measured, success rates of 100% were obtained. A sensitivity analysis indicated that these methods were also robust.
Hadronic equation of state in the statistical bootstrap model and linear graph theory
Fre, P.; Page, R.
1976-01-01
Taking a statistical mechanical point og view, the statistical bootstrap model is discussed and, from a critical analysis of the bootstrap volume comcept, it is reached a physical ipothesis, which leads immediately to the hadronic equation of state provided by the bootstrap integral equation. In this context also the connection between the statistical bootstrap and the linear graph theory approach to interacting gases is analyzed
Mario Miguel Ojeda Ramírez
2017-01-01
Full Text Available Currently some teachers implement different methods in order to promote education linked to reality, to provide more effective training and a meaningful learning. Activemethods aim to increase motivation and create scenarios in which student participation is central to achieve a more meaningful learning. This paper reports on the implementation of a process of educational innovation in the course of Topics of Multivariate Statistics offered in the degree in Statistical Sciences and Techniques at the Universidad Veracruzana (Mexico. The strategies used as sets for data collection, design and project development and realization of individual and group presentations are described. Information and communication technologies (ICT used are: EMINUS, distributed education platform of the Universidad Veracruzana, and managing files with Dropbox, plus communication via WhatsApp. The R software was used for statistical analysis and for making presentations in academic forums. To explore students' perceptions depth interviews were conducted and indicators for evaluating the student satisfaction were defined; the results show positive evidence, concluding that students were satisfied with the way that the course was designed and implemented. They also stated that they feel able to apply what they have learned. The opinions put that using these strategies they were feeling in preparation for their professional life. Finally, some suggestions for improving the course in future editions are included.
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Valdez, C. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sanner, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Martinez, H. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-11-28
Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduled precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.
Cheng Lin; Feng Songlin
2005-01-01
The major, minor and trace elements in the bodies of ancient colored glazes which came from the site of Xiyue Temple and Lidipo kiln in Shanxi province, and were unearthed from the stratums of Song, Yuan, Ming, Early Qing and Late Qing dynasty were analyzed by instrumental neutron activation analysis (INAA). The results of multivariable statistical analyses show that the chemical compositions of the colored glaze bodies are steady from Song to Early Qing dynasty, but distinctly different from that in Late Qing. Probably, the sources of fired material of ancient colored glaze from Song to Early Qing came from the site of Xiyue Temple. The chemical compositions of three pieces of colored glazes in Ming dynasty and that in Late Qing are similar to that of Lidipo kiln. From this, authors could conclude that the sources of the materials of ancient coloured glazes of Xiyue Temple in Late Qing dynasty were fired in Lidipo kiln. (authors)
Lifshits, A M
1979-01-01
General characteristics of the multivariate statistical analysis (MSA) is given. Methodical premises and criteria for the selection of an adequate MSA method applicable to pathoanatomic investigations of the epidemiology of multicausal diseases are presented. The experience of using MSA with computors and standard computing programs in studies of coronary arteries aterosclerosis on the materials of 2060 autopsies is described. The combined use of 4 MSA methods: sequential, correlational, regressional, and discriminant permitted to quantitate the contribution of each of the 8 examined risk factors in the development of aterosclerosis. The most important factors were found to be the age, arterial hypertension, and heredity. Occupational hypodynamia and increased fatness were more important in men, whereas diabetes melitus--in women. The registration of this combination of risk factors by MSA methods provides for more reliable prognosis of the likelihood of coronary heart disease with a fatal outcome than prognosis of the degree of coronary aterosclerosis.
Ludvigsen, Liselotte; Albrechtsen, Hans-Jørgen; Rootzén, Helle
1997-01-01
Different multivariate statistical analyses were applied to phospholipid fatty acids representing the biomass composition and to different biogeochemical parameters measured in 37 samples from a landfill contaminated aquifer at Grindsted Landfill (Denmark). Principal component analysis...... and correspondence analysis were used to identify groups of samples showing similar patterns with respect to biogeochemical variables and phospholipid fatty acid composition. The principal component analysis revealed that for the biogeochemical parameters the first principal component was linked to the pollution...... was used to allocate samples of phospholipid fatty acids into predefined classes. A large percentages of samples were classified correctly when discriminating samples into groups of dissolved organic carbon and specific conductivity, indicating that the biomass is highly influenced by the pollution...
Schoenwiese, C.D.
1990-01-01
Based on univariate correction and coherence analyses, including techniques moving in time, and taking account of the physical basis of the relationships, a simple multivariate concept is presented which correlates observational climatic time series simultaneously with solar, volcanic, ENSO (El Nino/Souther Oscillation) and anthropogenic greenhouse-gas forcing. The climatic elements considered are air temperature (near the ground and stratosphere), sea surface temperature, sea level and precipitation, and cover at least the period 1881-1980 (stratospheric temperature only since 1960). The climate signal assessments which may be hypothetically attributed to the observed CO 2 or equivalent CO 2 (implying additional greenhouse gases) increase are compared with those resulting from GCM experiments. In case of the Northern hemisphere air temperature these comparisons are performed not only in respect to hemispheric and global means, but also in respect to the regional and seasonal patterns. Autocorrelations and phase shifts of the climate response to natural and anthropogenic forcing complicate the statistical assessments
Kamal, Ghulam Mustafa; Wang, Xiaohua; Bin Yuan; Wang, Jie; Sun, Peng; Zhang, Xu; Liu, Maili
2016-09-01
Soy sauce a well known seasoning all over the world, especially in Asia, is available in global market in a wide range of types based on its purpose and the processing methods. Its composition varies with respect to the fermentation processes and addition of additives, preservatives and flavor enhancers. A comprehensive (1)H NMR based study regarding the metabonomic variations of soy sauce to differentiate among different types of soy sauce available on the global market has been limited due to the complexity of the mixture. In present study, (13)C NMR spectroscopy coupled with multivariate statistical data analysis like principle component analysis (PCA), and orthogonal partial least square-discriminant analysis (OPLS-DA) was applied to investigate metabonomic variations among different types of soy sauce, namely super light, super dark, red cooking and mushroom soy sauce. The main additives in soy sauce like glutamate, sucrose and glucose were easily distinguished and quantified using (13)C NMR spectroscopy which were otherwise difficult to be assigned and quantified due to serious signal overlaps in (1)H NMR spectra. The significantly higher concentration of sucrose in dark, red cooking and mushroom flavored soy sauce can directly be linked to the addition of caramel in soy sauce. Similarly, significantly higher level of glutamate in super light as compared to super dark and mushroom flavored soy sauce may come from the addition of monosodium glutamate. The study highlights the potentiality of (13)C NMR based metabonomics coupled with multivariate statistical data analysis in differentiating between the types of soy sauce on the basis of level of additives, raw materials and fermentation procedures. Copyright © 2016 Elsevier B.V. All rights reserved.
Moya, Claudio E; Raiber, Matthias; Taulis, Mauricio; Cox, Malcolm E
2015-03-01
The Galilee and Eromanga basins are sub-basins of the Great Artesian Basin (GAB). In this study, a multivariate statistical approach (hierarchical cluster analysis, principal component analysis and factor analysis) is carried out to identify hydrochemical patterns and assess the processes that control hydrochemical evolution within key aquifers of the GAB in these basins. The results of the hydrochemical assessment are integrated into a 3D geological model (previously developed) to support the analysis of spatial patterns of hydrochemistry, and to identify the hydrochemical and hydrological processes that control hydrochemical variability. In this area of the GAB, the hydrochemical evolution of groundwater is dominated by evapotranspiration near the recharge area resulting in a dominance of the Na-Cl water types. This is shown conceptually using two selected cross-sections which represent discrete groundwater flow paths from the recharge areas to the deeper parts of the basins. With increasing distance from the recharge area, a shift towards a dominance of carbonate (e.g. Na-HCO3 water type) has been observed. The assessment of hydrochemical changes along groundwater flow paths highlights how aquifers are separated in some areas, and how mixing between groundwater from different aquifers occurs elsewhere controlled by geological structures, including between GAB aquifers and coal bearing strata of the Galilee Basin. The results of this study suggest that distinct hydrochemical differences can be observed within the previously defined Early Cretaceous-Jurassic aquifer sequence of the GAB. A revision of the two previously recognised hydrochemical sequences is being proposed, resulting in three hydrochemical sequences based on systematic differences in hydrochemistry, salinity and dominant hydrochemical processes. The integrated approach presented in this study which combines different complementary multivariate statistical techniques with a detailed assessment of the
Jensen, Dan B; Hogeveen, Henk; De Vries, Albert
2016-09-01
Rapid detection of dairy cow mastitis is important so corrective action can be taken as soon as possible. Automatically collected sensor data used to monitor the performance and the health state of the cow could be useful for rapid detection of mastitis while reducing the labor needs for monitoring. The state of the art in combining sensor data to predict clinical mastitis still does not perform well enough to be applied in practice. Our objective was to combine a multivariate dynamic linear model (DLM) with a naïve Bayesian classifier (NBC) in a novel method using sensor and nonsensor data to detect clinical cases of mastitis. We also evaluated reductions in the number of sensors for detecting mastitis. With the DLM, we co-modeled 7 sources of sensor data (milk yield, fat, protein, lactose, conductivity, blood, body weight) collected at each milking for individual cows to produce one-step-ahead forecasts for each sensor. The observations were subsequently categorized according to the errors of the forecasted values and the estimated forecast variance. The categorized sensor data were combined with other data pertaining to the cow (week in milk, parity, mastitis history, somatic cell count category, and season) using Bayes' theorem, which produced a combined probability of the cow having clinical mastitis. If this probability was above a set threshold, the cow was classified as mastitis positive. To illustrate the performance of our method, we used sensor data from 1,003,207 milkings from the University of Florida Dairy Unit collected from 2008 to 2014. Of these, 2,907 milkings were associated with recorded cases of clinical mastitis. Using the DLM/NBC method, we reached an area under the receiver operating characteristic curve of 0.89, with a specificity of 0.81 when the sensitivity was set at 0.80. Specificities with omissions of sensor data ranged from 0.58 to 0.81. These results are comparable to other studies, but differences in data quality, definitions of
Catelani, Tiago A; Santos, João Rodrigo; Páscoa, Ricardo N M J; Pezza, Leonardo; Pezza, Helena R; Lopes, João A
2018-03-01
This work proposes the use of near infrared (NIR) spectroscopy in diffuse reflectance mode and multivariate statistical process control (MSPC) based on principal component analysis (PCA) for real-time monitoring of the coffee roasting process. The main objective was the development of a MSPC methodology able to early detect disturbances to the roasting process resourcing to real-time acquisition of NIR spectra. A total of fifteen roasting batches were defined according to an experimental design to develop the MSPC models. This methodology was tested on a set of five batches where disturbances of different nature were imposed to simulate real faulty situations. Some of these batches were used to optimize the model while the remaining was used to test the methodology. A modelling strategy based on a time sliding window provided the best results in terms of distinguishing batches with and without disturbances, resourcing to typical MSPC charts: Hotelling's T 2 and squared predicted error statistics. A PCA model encompassing a time window of four minutes with three principal components was able to efficiently detect all disturbances assayed. NIR spectroscopy combined with the MSPC approach proved to be an adequate auxiliary tool for coffee roasters to detect faults in a conventional roasting process in real-time. Copyright © 2017 Elsevier B.V. All rights reserved.
Silva, A F; Sarraguça, M C; Fonteyne, M; Vercruysse, J; De Leersnyder, F; Vanhoorne, V; Bostijn, N; Verstraeten, M; Vervaet, C; Remon, J P; De Beer, T; Lopes, J A
2017-08-07
A multivariate statistical process control (MSPC) strategy was developed for the monitoring of the ConsiGma™-25 continuous tablet manufacturing line. Thirty-five logged variables encompassing three major units, being a twin screw high shear granulator, a fluid bed dryer and a product control unit, were used to monitor the process. The MSPC strategy was based on principal component analysis of data acquired under normal operating conditions using a series of four process runs. Runs with imposed disturbances in the dryer air flow and temperature, in the granulator barrel temperature, speed and liquid mass flow and in the powder dosing unit mass flow were utilized to evaluate the model's monitoring performance. The impact of the imposed deviations to the process continuity was also evaluated using Hotelling's T 2 and Q residuals statistics control charts. The influence of the individual process variables was assessed by analyzing contribution plots at specific time points. Results show that the imposed disturbances were all detected in both control charts. Overall, the MSPC strategy was successfully developed and applied. Additionally, deviations not associated with the imposed changes were detected, mainly in the granulator barrel temperature control. Copyright © 2017 Elsevier B.V. All rights reserved.
Muhammad, Said; Tahir Shah, M; Khan, Sardar
2010-10-01
The present study was conducted in Kohistan region, where mafic and ultramafic rocks (Kohistan island arc and Indus suture zone) and metasedimentary rocks (Indian plate) are exposed. Water samples were collected from the springs, streams and Indus river and analyzed for physical parameters, anions, cations and arsenic (As(3+), As(5+) and arsenic total). The water quality in Kohistan region was evaluated by comparing the physio-chemical parameters with permissible limits set by Pakistan environmental protection agency and world health organization. Most of the studied parameters were found within their respective permissible limits. However in some samples, the iron and arsenic concentrations exceeded their permissible limits. For health risk assessment of arsenic, the average daily dose, hazards quotient (HQ) and cancer risk were calculated by using statistical formulas. The values of HQ were found >1 in the samples collected from Jabba, Dubair, while HQ values were pollution load was also calculated by using multivariate statistical techniques like one-way ANOVA, correlation analysis, regression analysis, cluster analysis and principle component analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.
Wojcik, Pawel Jerzy; Pereira, Luís; Martins, Rodrigo; Fortunato, Elvira
2014-01-13
An efficient mathematical strategy in the field of solution processed electrochromic (EC) films is outlined as a combination of an experimental work, modeling, and information extraction from massive computational data via statistical software. Design of Experiment (DOE) was used for statistical multivariate analysis and prediction of mixtures through a multiple regression model, as well as the optimization of a five-component sol-gel precursor subjected to complex constraints. This approach significantly reduces the number of experiments to be realized, from 162 in the full factorial (L=3) and 72 in the extreme vertices (D=2) approach down to only 30 runs, while still maintaining a high accuracy of the analysis. By carrying out a finite number of experiments, the empirical modeling in this study shows reasonably good prediction ability in terms of the overall EC performance. An optimized ink formulation was employed in a prototype of a passive EC matrix fabricated in order to test and trial this optically active material system together with a solid-state electrolyte for the prospective application in EC displays. Coupling of DOE with chromogenic material formulation shows the potential to maximize the capabilities of these systems and ensures increased productivity in many potential solution-processed electrochemical applications.
Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.
Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J
2016-10-03
Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.
Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo
2015-01-01
Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.
Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo [Kyung Hee University, Yongin (Korea, Republic of)
2015-08-15
Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.
Linear mixed-effects models for central statistical monitoring of multicenter clinical trials
Desmet, L.; Venet, D.; Doffagne, E.; Timmermans, C.; BURZYKOWSKI, Tomasz; LEGRAND, Catherine; BUYSE, Marc
2014-01-01
Multicenter studies are widely used to meet accrual targets in clinical trials. Clinical data monitoring is required to ensure the quality and validity of the data gathered across centers. One approach to this end is central statistical monitoring, which aims at detecting atypical patterns in the data by means of statistical methods. In this context, we consider the simple case of a continuous variable, and we propose a detection procedure based on a linear mixed-effects model to detect locat...
Response statistics of rotating shaft with non-linear elastic restoring forces by path integration
Gaidai, Oleg; Naess, Arvid; Dimentberg, Michael
2017-07-01
Extreme statistics of random vibrations is studied for a Jeffcott rotor under uniaxial white noise excitation. Restoring force is modelled as elastic non-linear; comparison is done with linearized restoring force to see the force non-linearity effect on the response statistics. While for the linear model analytical solutions and stability conditions are available, it is not generally the case for non-linear system except for some special cases. The statistics of non-linear case is studied by applying path integration (PI) method, which is based on the Markov property of the coupled dynamic system. The Jeffcott rotor response statistics can be obtained by solving the Fokker-Planck (FP) equation of the 4D dynamic system. An efficient implementation of PI algorithm is applied, namely fast Fourier transform (FFT) is used to simulate dynamic system additive noise. The latter allows significantly reduce computational time, compared to the classical PI. Excitation is modelled as Gaussian white noise, however any kind distributed white noise can be implemented with the same PI technique. Also multidirectional Markov noise can be modelled with PI in the same way as unidirectional. PI is accelerated by using Monte Carlo (MC) estimated joint probability density function (PDF) as initial input. Symmetry of dynamic system was utilized to afford higher mesh resolution. Both internal (rotating) and external damping are included in mechanical model of the rotor. The main advantage of using PI rather than MC is that PI offers high accuracy in the probability distribution tail. The latter is of critical importance for e.g. extreme value statistics, system reliability, and first passage probability.
Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar
2016-02-01
The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders
MELEK ACAR BOYACIOĞLU
2013-06-01
Full Text Available Financial strength rating indicates the fundamental financial strength of a bank. The aim of financial strength rating is to measure a bank’s fundamental financial strength excluding the external factors. External factors can stem from the working environment or can be linked with the outside protective support mechanisms. With the evaluation, the rating of a bank free from outside supportive factors is being sought. Also the financial fundamental, franchise value, the variety of assets and working environment of a bank are being evaluated in this context. In this study, a model has been developed in order to predict the financial strength rating of Turkish banks. The methodology of this study is as follows: Selecting variables to be used in the model, creating a data set, choosing the techniques to be used and the evaluation of classification success of the techniques. It is concluded that the artificial neural network system shows a better performance in terms of classification of financial strength rating in comparison to multivariate statistical methods in the raining set. On the other hand, there is no meaningful difference could be found in the validation set in which the prediction performances of the employed techniques are tested.
Takayama, Motoharu; Terui, Keita; Oiwa, Yoshitsugu
2012-10-01
Chronic subdural hematoma is common in elderly individuals and surgical procedures are simple. The recurrence rate of chronic subdural hematoma, however, varies from 9.2 to 26.5% after surgery. The authors studied factors of the recurrence using univariate and multivariate analyses in patients with chronic subdural hematoma We retrospectively reviewed 239 consecutive cases of chronic subdural hematoma who received burr-hole surgery with irrigation and closed-system drainage. We analyzed the relationships between recurrence of chronic subdural hematoma and factors such as sex, age, laterality, bleeding tendency, other complicated diseases, density on CT, volume of the hematoma, residual air in the hematoma cavity, use of artificial cerebrospinal fluid. Twenty-one patients (8.8%) experienced a recurrence of chronic subdural hematoma. Multiple logistic regression found that the recurrence rate was higher in patients with a large volume of the residual air, and was lower in patients using artificial cerebrospinal fluid. No statistical differences were found in bleeding tendency. Techniques to reduce the air in the hematoma cavity are important for good outcome in surgery of chronic subdural hematoma. Also, the use of artificial cerebrospinal fluid reduces recurrence of chronic subdural hematoma. The surgical procedures can be the same for patients with bleeding tendencies.
Jiabo Chen
2016-10-01
Full Text Available Source apportionment of river water pollution is critical in water resource management and aquatic conservation. Comprehensive application of various GIS-based multivariate statistical methods was performed to analyze datasets (2009–2011 on water quality in the Liao River system (China. Cluster analysis (CA classified the 12 months of the year into three groups (May–October, February–April and November–January and the 66 sampling sites into three groups (groups A, B and C based on similarities in water quality characteristics. Discriminant analysis (DA determined that temperature, dissolved oxygen (DO, pH, chemical oxygen demand (CODMn, 5-day biochemical oxygen demand (BOD5, NH4+–N, total phosphorus (TP and volatile phenols were significant variables affecting temporal variations, with 81.2% correct assignments. Principal component analysis (PCA and positive matrix factorization (PMF identified eight potential pollution factors for each part of the data structure, explaining more than 61% of the total variance. Oxygen-consuming organics from cropland and woodland runoff were the main latent pollution factor for group A. For group B, the main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter. For group C, the evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics.
Ebqa'ai, Mohammad; Ibrahim, Bashar
2017-12-01
This study aims to analyse the heavy metal pollutants in Jeddah, the second largest city in the Gulf Cooperation Council with a population exceeding 3.5 million, and many vehicles. Ninety-eight street dust samples were collected seasonally from the six major roads as well as the Jeddah Beach, and subsequently digested using modified Leeds Public Analyst method. The heavy metals (Fe, Zn, Mn, Cu, Cd, and Pb) were extracted from the ash using methyl isobutyl ketone as solvent extraction and eventually analysed by atomic absorption spectroscopy. Multivariate statistical techniques, principal component analysis (PCA), and hierarchical cluster analysis were applied to these data. Heavy metal concentrations were ranked according to the following descending order: Fe > Zn > Mn > Cu > Pb > Cd. In order to study the pollution and health risk from these heavy metals as well as estimating their effect on the environment, pollution indices, integrated pollution index, enrichment factor, daily dose average, hazard quotient, and hazard index were all analysed. The PCA showed high levels of Zn, Fe, and Cd in Al Kurnish road, while these elements were consistently detected on King Abdulaziz and Al Madina roads. The study indicates that high levels of Zn and Pb pollution were recorded for major roads in Jeddah. Six out of seven roads had high pollution indices. This study is the first step towards further investigations into current health problems in Jeddah, such as anaemia and asthma.
Chounlamany, Vanseng; Tanchuling, Maria Antonia; Inoue, Takanobu
2017-09-01
Payatas landfill in Quezon City, Philippines, releases leachate to the Marikina River through a creek. Multivariate statistical techniques were applied to study temporal and spatial variations in water quality of a segment of the Marikina River. The data set included 12 physico-chemical parameters for five monitoring stations over a year. Cluster analysis grouped the monitoring stations into four clusters and identified January-May as dry season and June-September as wet season. Principal components analysis showed that three latent factors are responsible for the data set explaining 83% of its total variance. The chemical oxygen demand, biochemical oxygen demand, total dissolved solids, Cl - and PO 4 3- are influenced by anthropogenic impact/eutrophication pollution from point sources. Total suspended solids, turbidity and SO 4 2- are influenced by rain and soil erosion. The highest state of pollution is at the Payatas creek outfall from March to May, whereas at downstream stations it is in May. The current study indicates that the river monitoring requires only four stations, nine water quality parameters and testing over three specific months of the year. The findings of this study imply that Payatas landfill requires a proper leachate collection and treatment system to reduce its impact on the Marikina River.
Busico, Gianluigi; Cuoco, Emilio; Kazakis, Nerantzis; Colombani, Nicolò; Mastrocicco, Micòl; Tedesco, Dario; Voudouris, Konstantinos
2018-03-01
Shallow aquifers are the most accessible reservoirs of potable groundwater; nevertheless, they are also prone to various sources of pollution and it is usually difficult to distinguish between human and natural sources at the watershed scale. The area chosen for this study (the Campania Plain) is characterized by high spatial heterogeneities both in geochemical features and in hydraulic properties. Groundwater mineralization is driven by many processes such as, geothermal activity, weathering of volcanic products and intense human activities. In such a landscape, multivariate statistical analysis has been used to differentiate among the main hydrochemical processes occurring in the area, using three different approaches of factor analysis: (i) major elements, (ii) trace elements, (iii) both major and trace elements. The elaboration of the factor analysis approaches has revealed seven distinct hydrogeochemical processes: i) Salinization (Cl - , Na + ); ii) Carbonate rocks dissolution; iii) Anthropogenic inputs (NO 3 - , SO 4 2- , U, V); iv) Reducing conditions (Fe 2+ , Mn 2+ ); v) Heavy metals contamination (Cr and Ni); vi) Geothermal fluids influence (Li + ); and vii) Volcanic products contribution (As, Rb). Results from this study highlight the need to separately apply factor analysis when a large data set of trace elements is available. In fact, the impact of geothermal fluids in the shallow aquifer was identified from the application of the factor analysis using only trace elements. This study also reveals that the factor analysis of major and trace elements can differentiate between anthropogenic and geogenic sources of pollution in intensively exploited aquifers. Copyright © 2017 Elsevier Ltd. All rights reserved.
Yunhui Zhang
2018-01-01
Full Text Available The utilization for water resource has been of great concern to human life. To assess the natural water system in Kangding County, the integrated methods of hydrochemical analysis, multivariate statistics and geochemical modelling were conducted on surface water, groundwater, and thermal water samples. Surface water and groundwater were dominated by Ca-HCO3 type, while thermal water belonged to Ca-HCO3 and Na-Cl-SO4 types. The analyzing results concluded the driving factors that affect hydrochemical components. Following the results of the combined assessments, hydrochemical process was controlled by the dissolution of carbonate and silicate minerals with slight influence from anthropogenic activity. The mixing model of groundwater and thermal water was calculated using silica-enthalpy method, yielding cold-water fraction of 0.56–0.79 and an estimated reservoir temperature of 130–199 °C, respectively. δD and δ18O isotopes suggested that surface water, groundwater and thermal springs were of meteoric origin. Thermal water should have deep circulation through the Xianshuihe fault zone, while groundwater flows through secondary fractures where it recharges with thermal water. Those analytical results were used to construct a hydrological conceptual model, providing a better understanding of the natural water system in Kangding County.
Thuy, Tran Thi; Tengstrand, Erik; Aberg, Magnus; Thorsén, Gunnar
2012-11-01
Optimal glycosylation with respect to the efficacy, serum half-life time, and immunogenic properties is essential in the generation of therapeutic antibodies. The glycosylation pattern can be affected by several different parameters during the manufacture of antibodies and may change significantly over cultivation time. Fast and robust methods for determination of the glycosylation patterns of therapeutic antibodies are therefore needed. We have recently presented an efficient method for the determination of glycans on therapeutic antibodies using a microfluidic CD platform for sample preparation prior to matrix-assisted laser-desorption mass spectrometry analysis. In the present work, this method is applied to analyse the glycosylation patterns of three commercially available therapeutic antibodies and one intended for therapeutic use. Two of the antibodies produced in mouse myeloma cell line (SP2/0) and one produced in Chinese hamster ovary (CHO) cells exhibited similar glycosylation patterns but could still be readily differentiated from each other using multivariate statistical methods. The two antibodies with most similar glycosylation patterns were also studied in an assessment of the method's applicability for quality control of therapeutic antibodies. The method presented in this paper is highly automated and rapid. It can therefore efficiently generate data that helps to keep a production process within the desired design space or assess that an identical product is being produced after changes to the process. Copyright © 2012 Elsevier B.V. All rights reserved.
Freitas, Renato; Rabello, Angela; Lima, Tania
2011-01-01
Full text: In this work it was characterized the elemental composition of 102 fragments of Marajoara pubic covers, belonging to the National Museum collection, using EDXRF and multivariate statistics analysis. The objective was to identify possible groups of samples that presented similar characteristics. This information will be useful in the development of a systematic classification of these artifacts. Provenance studies of ancient ceramics are based on the assumption that pottery produced from a specific clay will present a similar chemical composition, which will distinguish them from pottery produced from a different clay. In this way, the pottery is assigned to particular production groups, which are then correlated with their respective origins. EDXRF measurements were carried out with a portable system, developed in the Nuclear Instrumentation Laboratory, consisting of an X-ray tube Oxford TF3005 with tungsten (W) anode, operating at 25 kV and 100 μA, and a Si-PIN XR-100CR detector from Amptek. In each one of the 102 fragments, six points were analyzed (three in the front part and three in the reverse) with an acquisition time of 600 s and a beam collimation of 2 mm. The spectra were processed and analyzed using the software QXAS-AXIL from IAEA. PCA was applied to the XRF results revealing a clear cluster separation to the samples. (author)
Khan, Firdos; Pilz, Jürgen
2016-04-01
South Asia is under the severe impacts of changing climate and global warming. The last two decades showed that climate change or global warming is happening and the first decade of 21st century is considered as the warmest decade over Pakistan ever in history where temperature reached 53 0C in 2010. Consequently, the spatio-temporal distribution and intensity of precipitation is badly effected and causes floods, cyclones and hurricanes in the region which further have impacts on agriculture, water, health etc. To cope with the situation, it is important to conduct impact assessment studies and take adaptation and mitigation remedies. For impact assessment studies, we need climate variables at higher resolution. Downscaling techniques are used to produce climate variables at higher resolution; these techniques are broadly divided into two types, statistical downscaling and dynamical downscaling. The target location of this study is the monsoon dominated region of Pakistan. One reason for choosing this area is because the contribution of monsoon rains in this area is more than 80 % of the total rainfall. This study evaluates a statistical downscaling technique which can be then used for downscaling climatic variables. Two statistical techniques i.e. quantile regression and copula modeling are combined in order to produce realistic results for climate variables in the area under-study. To reduce the dimension of input data and deal with multicollinearity problems, empirical orthogonal functions will be used. Advantages of this new method are: (1) it is more robust to outliers as compared to ordinary least squares estimates and other estimation methods based on central tendency and dispersion measures; (2) it preserves the dependence among variables and among sites and (3) it can be used to combine different types of distributions. This is important in our case because we are dealing with climatic variables having different distributions over different meteorological
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-07-01
A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W
2013-02-01
Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.
Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom
2016-04-01
Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A comparison of linear and nonlinear statistical techniques in performance attribution.
Chan, N H; Genovese, C R
2001-01-01
Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.
Matiatos, Ioannis; Alexopoulos, Apostolos; Godelitsas, Athanasios
2014-04-01
The present study involves an integration of the hydrogeological, hydrochemical and isotopic (both stable and radiogenic) data of the groundwater samples taken from aquifers occurring in the region of northeastern Peloponnesus. Special emphasis has been given to health-related ions and isotopes in relation to the WHO and USEPA guidelines, to highlight the concentrations of compounds (e.g., As and Ba) exceeding the drinking water thresholds. Multivariate statistical analyses, i.e. two principal component analyses (PCA) and one discriminant analysis (DA), combined with conventional hydrochemical methodologies, were applied, with the aim to interpret the spatial variations in the groundwater quality and to identify the main hydrogeochemical factors and human activities responsible for the high ion concentrations and isotopic content in the groundwater analysed. The first PCA resulted in a three component model, which explained approximately 82% of the total variance of the data sets and enabled the identification of the hydrogeological processes responsible for the isotopic content i.e., δ(18)Ο, tritium and (222)Rn. The second PCA, involving the trace element presence in the water samples, revealed a four component model, which explained approximately 89% of the total variance of the data sets, giving more insight into the geochemical and anthropogenic controls on the groundwater composition (e.g., water-rock interaction, hydrothermal activity and agricultural activities). Using discriminant analysis, a four parameter (δ(18)O, (Ca+Mg)/(HCO3+SO4), EC and Cl) discriminant function concerning the (222)Rn content was derived, which favoured a classification of the samples according to the concentration of (222)Rn as (222)Rn-safe (11 Bq·L(-1)). The selection of radon builds on the fact that this radiogenic isotope has been generally related to increased health risk when consumed. Copyright © 2014 Elsevier B.V. All rights reserved.
Zhang, Yaxin; Tian, Ye; Shen, Maocai; Zeng, Guangming
2018-03-03
Heavy metal contamination in soils/sediments and its impact on human health and ecological environment have aroused wide concerns. Our study investigated 30 samples of soils and sediments around Dongting Lake to analyze the concentration of As, Cd, Cr, Cu, Fe, Mn, Ni, Pb, and Zn in the samples and to distinguish the natural and anthropogenic sources. Also, the relationship between heavy metals and the physicochemical properties of samples was studied by multivariate statistical analysis. Concentration of Cd at most sampling sites were more than five times that of national environmental quality standard for soil in China (GB 15618-1995), and Pb and Zn levels exceeded one to two times. Moreover, Cr in the soil was higher than the national environmental quality standards for one to two times while in sediment was lower than the national standard. The investigation revealed that the accumulations of As, Cd, Mn, and Pb in the soils, and sediments were affected apparently by anthropogenic activities; however, Cr, Fe, and Ni levels were impacted by parent materials. Human activities around Dongting Lake mainly consisted of industrial activities, mining and smelting, sewage discharges, fossil fuel combustion, and agricultural chemicals. The spatial distribution of heavy metal in soil followed the rule of geographical gradient, whereas in sediments, it was significantly affected by the river basins and human activities. The result of principal component analysis (PCA) demonstrated that heavy metals in soils were associated with pH and total phosphorus (TP), while in sediments, As, Cr, Fe, and Ni were closely associated with cation exchange capacity (CEC) and pH, where Pb, Zn, and Cd were associated with total nitrogen (TN), TP, total carbon (TC), moisture content (MC), soil organic matter (SOM), and ignition lost (IL). Our research provides comprehensive approaches to better understand the potential sources and the fate of contaminants in lakeshore soils and sediments.
Weili Duan
2016-01-01
Full Text Available Multivariate statistical methods including cluster analysis (CA, discriminant analysis (DA and component analysis/factor analysis (PCA/FA, were applied to explore the surface water quality datasets including 14 parameters at 28 sites of the Eastern Poyang Lake Basin, Jiangxi Province of China, from January 2012 to April 2015, characterize spatiotemporal variation in pollution and identify potential pollution sources. The 28 sampling stations were divided into two periods (wet season and dry season and two regions (low pollution and high pollution, respectively, using hierarchical CA method. Four parameters (temperature, pH, ammonia-nitrogen (NH4-N, and total nitrogen (TN were identified using DA to distinguish temporal groups with close to 97.86% correct assignations. Again using DA, five parameters (pH, chemical oxygen demand (COD, TN, Fluoride (F, and Sulphide (S led to 93.75% correct assignations for distinguishing spatial groups. Five potential pollution sources including nutrients pollution, oxygen consuming organic pollution, fluorine chemical pollution, heavy metals pollution and natural pollution, were identified using PCA/FA techniques for both the low pollution region and the high pollution region. Heavy metals (Cuprum (Cu, chromium (Cr and Zinc (Zn, fluoride and sulfide are of particular concern in the study region because of many open-pit copper mines such as Dexing Copper Mine. Results obtained from this study offer a reasonable classification scheme for low-cost monitoring networks. The results also inform understanding of spatio-temporal variation in water quality as these topics relate to water resources management.
Rakotondrabe, Felaniaina; Ndam Ngoupayou, Jules Remy; Mfonka, Zakari; Rasolomanana, Eddy Harilala; Nyangono Abolo, Alexis Jacob; Ako Ako, Andrew
2018-01-01
The influence of gold mining activities on the water quality in the Mari catchment in Bétaré-Oya (East Cameroon) was assessed in this study. Sampling was performed within the period of one hydrological year (2015 to 2016), with 22 sampling sites consisting of groundwater (06) and surface water (16). In addition to measuring the physicochemical parameters, such as pH, electrical conductivity, alkalinity, turbidity, suspended solids and CN - , eleven major elements (Na + , K + , Ca 2+ , Mg 2+ , NH 4 + , Cl - , NO 3 - , HCO 3 - , SO 4 2- , PO 4 3- and F - ) and eight heavy metals (Pb, Zn, Cd, Fe, Cu, As, Mn and Cr) were also analyzed using conventional hydrochemical methods, Multivariate Statistical Analysis and the Heavy metal Pollution Index (HPI). The results showed that the water from Mari catchment and Lom River was acidic to basic (5.40water quality, except for nitrates in some wells, which was found at a concentration >50mg NO 3 - /L. This water was found as two main types: calcium magnesium bicarbonate (CaMg-HCO 3 ), which was the most represented, and sodium bicarbonate potassium (NaK-HCO 3 ). As for trace elements in surface water, the contents of Pb, Cd, Mn, Cr and Fe were higher than recommended by the WHO guidelines, and therefore, the surface water was unsuitable for human consumption. Three phenomena were responsible for controlling the quality of the water in the study area: hydrolysis of silicate minerals of plutono-metamorphic rocks, which constitute the geological basement of this area; vegetation and soil leaching; and mining activities. The high concentrations of TSS and trace elements found in this basin were mainly due to gold mining activities (exploration and exploitation) as well as digging of rivers beds, excavation and gold amalgamation. Copyright © 2017 Elsevier B.V. All rights reserved.
Keita, Souleymane; Zhonghua, Tang
2017-10-01
Sustainable management of groundwater resources is a major issue for developing countries, especially in Mali. The multiple uses of groundwater led countries to promote sound management policies for sustainable use of the groundwater resources. For this reason, each country needs data enabling it to monitor and predict the changes of the resources. Also given the importance of groundwater quality changes often marked by the recurrence of droughts; the potential impacts of regional and geological setting of groundwater resources requires careful study. Unfortunately, recent decades have seen a considerable reduction of national capacities to ensure the hydrogeological monitoring and production of qualit data for decision making. The purpose of this work is to use the groundwater data and translate into useful information that can improve water resources management capacity in Mali. In this paper, we used groundwater analytical data from accredited, laboratories in Mali to carry out a national scale assessment of the groundwater types and their distribution. We, adapted multivariate statistical methods to classify 2035 groundwater samples into seven main groundwater types and built a national scale map from the results. We used a two-level K-mean clustering technique to examine the hydro-geochemical records as percentages of the total concentrations of major ions, namely sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO3), and sulphate (SO4). The first step of clustering formed 20 groups, and these groups were then re-clustered to produce the final seven groundwater types. The results were verified and confirmed using Principal Component Analysis (PCA) and RockWare (Aq.QA) software. We found that HCO3 was the most dominant anion throughout the country and that Cl and SO4 were only important in some local zones. The dominant cations were Na and Mg. Also, major ion ratios changed with geographical location and geological, and climatic
Statistical mechanical analysis of the linear vector channel in digital communication
Takeda, Koujin; Hatabu, Atsushi; Kabashima, Yoshiyuki
2007-01-01
A statistical mechanical framework to analyze linear vector channel models in digital wireless communication is proposed for a large system. The framework is a generalization of that proposed for code-division multiple-access systems in Takeda et al (2006 Europhys. Lett. 76 1193) and enables the analysis of the system in which the elements of the channel transfer matrix are statistically correlated with each other. The significance of the proposed scheme is demonstrated by assessing the performance of an existing model of multi-input multi-output communication systems
Zhang, Jing; Liang, Lichen; Anderson, Jon R; Gatewood, Lael; Rottenberg, David A; Strother, Stephen C
2008-01-01
As functional magnetic resonance imaging (fMRI) becomes widely used, the demands for evaluation of fMRI processing pipelines and validation of fMRI analysis results is increasing rapidly. The current NPAIRS package, an IDL-based fMRI processing pipeline evaluation framework, lacks system interoperability and the ability to evaluate general linear model (GLM)-based pipelines using prediction metrics. Thus, it can not fully evaluate fMRI analytical software modules such as FSL.FEAT and NPAIRS.GLM. In order to overcome these limitations, a Java-based fMRI processing pipeline evaluation system was developed. It integrated YALE (a machine learning environment) into Fiswidgets (a fMRI software environment) to obtain system interoperability and applied an algorithm to measure GLM prediction accuracy. The results demonstrated that the system can evaluate fMRI processing pipelines with univariate GLM and multivariate canonical variates analysis (CVA)-based models on real fMRI data based on prediction accuracy (classification accuracy) and statistical parametric image (SPI) reproducibility. In addition, a preliminary study was performed where four fMRI processing pipelines with GLM and CVA modules such as FSL.FEAT and NPAIRS.CVA were evaluated with the system. The results indicated that (1) the system can compare different fMRI processing pipelines with heterogeneous models (NPAIRS.GLM, NPAIRS.CVA and FSL.FEAT) and rank their performance by automatic performance scoring, and (2) the rank of pipeline performance is highly dependent on the preprocessing operations. These results suggest that the system will be of value for the comparison, validation, standardization and optimization of functional neuroimaging software packages and fMRI processing pipelines.
Budgor, A.B.; West, B.J.
1978-01-01
We employ the equivalence between Zwanzig's projection-operator formalism and perturbation theory to demonstrate that the approximate-solution technique of statistical linearization for nonlinear stochastic differential equations corresponds to the lowest-order β truncation in both the consolidated perturbation expansions and in the ''mass operator'' of a renormalized Green's function equation. Other consolidated equations can be obtained by selectively modifying this mass operator. We particularize the results of this paper to the Duffing anharmonic oscillator equation
On the analysis of clonogenic survival data: Statistical alternatives to the linear-quadratic model
Unkel, Steffen; Belka, Claus; Lauber, Kirsten
2016-01-01
the extraction of scores of radioresistance, which displayed significant correlations with the estimated parameters of the regression models. Undoubtedly, LQ regression is a robust method for the analysis of clonogenic survival data. Nevertheless, alternative approaches including non-linear regression and multivariate techniques such as cluster analysis and principal component analysis represent versatile tools for the extraction of parameters and/or scores of the cellular response towards ionizing irradiation with a more intuitive biological interpretation. The latter are highly informative for correlation analyses with other types of data, including functional genomics data that are increasingly beinggenerated
Litvinenko, Alexander
2017-12-10
Matrices began in the 2nd century BC with the Chinese. One can find traces, which go to the 4th century BC to the Babylonians. The text ``Nine Chapters of the Mathematical Art\\'\\' written during the Han Dynasty in China gave the first known example of matrix methods. They were used to solve simultaneous linear equations (more in http://math.nie.edu.sg/bwjyeo/it/MathsOnline_AM/livemath/the/IT3AMMatricesHistory.html). The first ideas of the maximum likelihood estimation (MLE) was introduces by Laplace (1749-1827), by Gauss (1777-1855), the Likelihood was defined by Thiele Thorvald (1838-1910). Why we still use matrices? The matrix data format is more than 2200 years old. Our world is multi-dimensional! Why not to introduce a more appropriate data format and why not to reformulate the MLE method for it? In this work we are utilizing the low-rank tensor formats for multi-dimansional functions, which appear in spatial statistics.
Chen, Hao [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Lu, Xinwei, E-mail: luxinwei@snnu.edu.cn [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Li, Loretta Y., E-mail: lli@civil.ubc.ca [Department of Civil Engineering, University of British Columbia, Vancouver V6T 1Z4 (Canada); Gao, Tianning; Chang, Yuyu [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China)
2014-06-01
The concentrations of As, Ba, Co, Cr, Cu, Mn, Ni, Pb, V and Zn in campus dust from kindergartens, elementary schools, middle schools and universities of Xi'an, China were determined by X-ray fluorescence spectrometry. Correlation coefficient analysis, principal component analysis (PCA) and cluster analysis (CA) were used to analyze the data and to identify possible sources of these metals in the dust. The spatial distributions of metals in urban dust of Xi'an were analyzed based on the metal concentrations in campus dusts using the geostatistics method. The results indicate that dust samples from campuses have elevated metal concentrations, especially for Pb, Zn, Co, Cu, Cr and Ba, with the mean values of 7.1, 5.6, 3.7, 2.9, 2.5 and 1.9 times the background values for Shaanxi soil, respectively. The enrichment factor results indicate that Mn, Ni, V, As and Ba in the campus dust were deficiently to minimally enriched, mainly affected by nature and partly by anthropogenic sources, while Co, Cr, Cu, Pb and Zn in the campus dust and especially Pb and Zn were mostly affected by human activities. As and Cu, Mn and Ni, Ba and V, and Pb and Zn had similar distribution patterns. The southwest high-tech industrial area and south commercial and residential areas have relatively high levels of most metals. Three main sources were identified based on correlation coefficient analysis, PCA, CA, as well as spatial distribution characteristics. As, Ni, Cu, Mn, Pb, Zn and Cr have mixed sources — nature, traffic, as well as fossil fuel combustion and weathering of materials. Ba and V are mainly derived from nature, but partly also from industrial emissions, as well as construction sources, while Co principally originates from construction. - Highlights: • Metal content in dust from schools was determined by XRF. • Spatial distribution of metals in urban dust was focused on campus samples. • Multivariate statistic and spatial distribution were used to identify metal
Chen, Hao; Lu, Xinwei; Li, Loretta Y.; Gao, Tianning; Chang, Yuyu
2014-01-01
The concentrations of As, Ba, Co, Cr, Cu, Mn, Ni, Pb, V and Zn in campus dust from kindergartens, elementary schools, middle schools and universities of Xi'an, China were determined by X-ray fluorescence spectrometry. Correlation coefficient analysis, principal component analysis (PCA) and cluster analysis (CA) were used to analyze the data and to identify possible sources of these metals in the dust. The spatial distributions of metals in urban dust of Xi'an were analyzed based on the metal concentrations in campus dusts using the geostatistics method. The results indicate that dust samples from campuses have elevated metal concentrations, especially for Pb, Zn, Co, Cu, Cr and Ba, with the mean values of 7.1, 5.6, 3.7, 2.9, 2.5 and 1.9 times the background values for Shaanxi soil, respectively. The enrichment factor results indicate that Mn, Ni, V, As and Ba in the campus dust were deficiently to minimally enriched, mainly affected by nature and partly by anthropogenic sources, while Co, Cr, Cu, Pb and Zn in the campus dust and especially Pb and Zn were mostly affected by human activities. As and Cu, Mn and Ni, Ba and V, and Pb and Zn had similar distribution patterns. The southwest high-tech industrial area and south commercial and residential areas have relatively high levels of most metals. Three main sources were identified based on correlation coefficient analysis, PCA, CA, as well as spatial distribution characteristics. As, Ni, Cu, Mn, Pb, Zn and Cr have mixed sources — nature, traffic, as well as fossil fuel combustion and weathering of materials. Ba and V are mainly derived from nature, but partly also from industrial emissions, as well as construction sources, while Co principally originates from construction. - Highlights: • Metal content in dust from schools was determined by XRF. • Spatial distribution of metals in urban dust was focused on campus samples. • Multivariate statistic and spatial distribution were used to identify metal sources.
Normality of raw data in general linear models: The most widespread myth in statistics
Kery, Marc; Hatfield, Jeff S.
2003-01-01
In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
Statistical study of the non-linear propagation of a partially coherent laser beam
Ayanides, J.P.
2001-01-01
This research thesis is related to the LMJ project (Laser MegaJoule) and thus to the study and development of thermonuclear fusion. It reports the study of the propagation of a partially-coherent laser beam by using a statistical modelling in order to obtain mean values for the field, and thus bypassing a complex and costly calculation of deterministic quantities. Random fluctuations of the propagated field are supposed to comply with a Gaussian statistics; the laser central wavelength is supposed to be small with respect with fluctuation magnitude; a scale factor is introduced to clearly distinguish the scale of the random and fast variations of the field fluctuations, and the scale of the slow deterministic variations of the field envelopes. The author reports the study of propagation through a purely linear media and through a non-dispersive media, and then through slow non-dispersive and non-linear media (in which the reaction time is large with respect to grain correlation duration, but small with respect to the variation scale of the field macroscopic envelope), and thirdly through an instantaneous dispersive and non linear media (which instantaneously reacts to the field) [fr
Chattopadhyay, Anirban; Khondekar, Mofazzal Hossain; Bhattacharjee, Anup Kumar
2017-09-01
In this paper initiative has been taken to search the periodicities of linear speed of Coronal Mass Ejection in solar cycle 23. Double exponential smoothing and Discrete Wavelet Transform are being used for detrending and filtering of the CME linear speed time series. To choose the appropriate statistical methodology for the said purpose, Smoothed Pseudo Wigner-Ville distribution (SPWVD) has been used beforehand to confirm the non-stationarity of the time series. The Time-Frequency representation tool like Hilbert Huang Transform and Empirical Mode decomposition has been implemented to unearth the underneath periodicities in the non-stationary time series of the linear speed of CME. Of all the periodicities having more than 95% Confidence Level, the relevant periodicities have been segregated out using Integral peak detection algorithm. The periodicities observed are of low scale ranging from 2-159 days with some relevant periods like 4 days, 10 days, 11 days, 12 days, 13.7 days, 14.5 and 21.6 days. These short range periodicities indicate the probable origin of the CME is the active longitude and the magnetic flux network of the sun. The results also insinuate about the probable mutual influence and causality with other solar activities (like solar radio emission, Ap index, solar wind speed, etc.) owing to the similitude between their periods and CME linear speed periods. The periodicities of 4 days and 10 days indicate the possible existence of the Rossby-type waves or planetary waves in Sun.
Role of statistical linearization in the solution of nonlinear stochastic equations
Budgor, A.B.
1977-01-01
The solution of a generalized Langevin equation is referred to as a stochastic process. If the external forcing function is Gaussian white noise, the forward Kolmogarov equation yields the transition probability density function. Nonlinear problems must be handled by approximation procedures e.g., perturbation theories, eigenfunction expansions, and nonlinear optimization procedures. After some comments on the first two of these, attention is directed to the third, and the method of statistical linearization is used to demonstrate a relation to the former two. Nonlinear stochastic systems exhibiting sustained or forced oscillations and the centered nonlinear Schroedinger equation in the presence of Gaussian white noise excitation are considered as examples. 5 figures, 2 tables
Girum Admasu Nadew; Zebene Lakew Tefera
2013-01-01
Multivariate statistical analysis is very important to classify waters of different hydrochemical groups. Statistical techniques, such as cluster analysis, can provide a powerful tool for analyzing water chemistry data. This method is used to test water quality data and determine if samples can be grouped into distinct populations that may be significant in the geologic context, as well as from a statistical point of view. Multivariate statistical analysis method is applied to the geochemical data in combination with δ 18 O and δ 2 H isotopes with the objective to understand the dynamics of groundwater using hierarchical clustering and isotope analyses. The geochemical and isotope data of the central and southern rift valley lakes have been collected and analyzed from different works. Isotope analysis shows that most springs and boreholes are recharged by July and August rainfalls. The different hydrochemical groups that resulted from the multivariate analysis are described and correlated with the geology of the area and whether it has any interaction with a system or not. (author)
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J
2016-05-01
Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.
Liu, Yan; Cai, Wensheng; Shao, Xueguang
2016-12-01
Calibration transfer is essential for practical applications of near infrared (NIR) spectroscopy because the measurements of the spectra may be performed on different instruments and the difference between the instruments must be corrected. For most of calibration transfer methods, standard samples are necessary to construct the transfer model using the spectra of the samples measured on two instruments, named as master and slave instrument, respectively. In this work, a method named as linear model correction (LMC) is proposed for calibration transfer without standard samples. The method is based on the fact that, for the samples with similar physical and chemical properties, the spectra measured on different instruments are linearly correlated. The fact makes the coefficients of the linear models constructed by the spectra measured on different instruments are similar in profile. Therefore, by using the constrained optimization method, the coefficients of the master model can be transferred into that of the slave model with a few spectra measured on slave instrument. Two NIR datasets of corn and plant leaf samples measured with different instruments are used to test the performance of the method. The results show that, for both the datasets, the spectra can be correctly predicted using the transferred partial least squares (PLS) models. Because standard samples are not necessary in the method, it may be more useful in practical uses.
Heyen, H. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Gewaesserphysik
1998-12-31
A multivariate statistical approach is presented that allows a systematic search for relationships between the interannual variability in climate records and ecological time series. Statistical models are built between climatological predictor fields and the variables of interest. Relationships are sought on different temporal scales and for different seasons and time lags. The possibilities and limitations of this approach are discussed in four case studies dealing with salinity in the German Bight, abundance of zooplankton at Helgoland Roads, macrofauna communities off Norderney and the arrival of migratory birds on Helgoland. (orig.) [Deutsch] Ein statistisches, multivariates Modell wird vorgestellt, das eine systematische Suche nach potentiellen Zusammenhaengen zwischen Variabilitaet in Klima- und oekologischen Zeitserien erlaubt. Anhand von vier Anwendungsbeispielen wird der Klimaeinfluss auf den Salzgehalt in der Deutschen Bucht, Zooplankton vor Helgoland, Makrofauna vor Norderney, und die Ankunft von Zugvoegeln auf Helgoland untersucht. (orig.)
Cláudio Roberto Rosário
2012-07-01
Full Text Available The purpose of this research is to improve the practice on customer satisfaction analysis The article presents an analysis model to analyze the answers of a customer satisfaction evaluation in a systematic way with the aid of multivariate statistical techniques, specifically, exploratory analysis with PCA – Partial Components Analysis with HCA - Hierarchical Cluster Analysis. It was tried to evaluate the applicability of the model to be used by the issue company as a tool to assist itself on identifying the value chain perceived by the customer when applied the questionnaire of customer satisfaction. It was found with the assistance of multivariate statistical analysis that it was observed similar behavior among customers. It also allowed the company to conduct reviews on questions of the questionnaires, using analysis of the degree of correlation between the questions that was not a company’s practice before this research.
Alfi, V.; Cristelli, M.; Pietronero, L.; Zaccaria, A.
2009-02-01
We present a detailed study of the statistical properties of the Agent Based Model introduced in paper I [Eur. Phys. J. B, DOI: 10.1140/epjb/e2009-00028-4] and of its generalization to the multiplicative dynamics. The aim of the model is to consider the minimal elements for the understanding of the origin of the stylized facts and their self-organization. The key elements are fundamentalist agents, chartist agents, herding dynamics and price behavior. The first two elements correspond to the competition between stability and instability tendencies in the market. The herding behavior governs the possibility of the agents to change strategy and it is a crucial element of this class of models. We consider a linear approximation for the price dynamics which permits a simple interpretation of the model dynamics and, for many properties, it is possible to derive analytical results. The generalized non linear dynamics results to be extremely more sensible to the parameter space and much more difficult to analyze and control. The main results for the nature and self-organization of the stylized facts are, however, very similar in the two cases. The main peculiarity of the non linear dynamics is an enhancement of the fluctuations and a more marked evidence of the stylized facts. We will also discuss some modifications of the model to introduce more realistic elements with respect to the real markets.
Vetrimurugan Elumalai; K. Brindha; Elango Lakshmanan
2017-01-01
Heavy metals in surface and groundwater were analysed and their sources were identified using multivariate statistical tools for two towns in South Africa. Human exposure risk through the drinking water pathway was also assessed. Electrical conductivity values showed that groundwater is desirable to permissible for drinking except for six locations. Concentration of aluminium, lead and nickel were above the permissible limit for drinking at all locations. Boron, cadmium, iron and manganese ex...
Jens G. Balchen
1984-10-01
Full Text Available The problem of systematic derivation of a quasi-dynamic optimal control strategy for a non-linear dynamic process based upon a non-quadratic objective function is investigated. The wellknown LQG-control algorithm does not lead to an optimal solution when the process disturbances have non-zero mean. The relationships between the proposed control algorithm and LQG-control are presented. The problem of how to constrain process variables by means of 'penalty' - terms in the objective function is dealt with separately.
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Masuda, Takanori; Nakaura, Takeshi; Funama, Yoshinori; Higaki, Toru; Kiguchi, Masao; Imada, Naoyuki; Sato, Tomoyasu; Awai, Kazuo
We evaluated the effect of the age, sex, total body weight (TBW), height (HT) and cardiac output (CO) of patients on aortic and hepatic contrast enhancement during hepatic-arterial phase (HAP) and portal venous phase (PVP) computed tomography (CT) scanning. This prospective study received institutional review board approval; prior informed consent to participate was obtained from all 168 patients. All were examined using our routine protocol; the contrast material was 600 mg/kg iodine. Cardiac output was measured with a portable electrical velocimeter within 5 minutes of starting the CT scan. We calculated contrast enhancement (per gram of iodine: [INCREMENT]HU/gI) of the abdominal aorta during the HAP and of the liver parenchyma during the PVP. We performed univariate and multivariate linear regression analysis between all patient characteristics and the [INCREMENT]HU/gI of aortic- and liver parenchymal enhancement. Univariate linear regression analysis demonstrated statistically significant correlations between the [INCREMENT]HU/gI and the age, sex, TBW, HT, and CO (all P linear regression analysis showed that only the TBW and CO were of independent predictive value (P linear regression analysis only the TBW and CO were significantly correlated with aortic and liver parenchymal enhancement; the age, sex, and HT were not. The CO was the only independent factor affecting aortic and liver parenchymal enhancement at hepatic CT when the protocol was adjusted for the TBW.
Giannantonio, Tommaso; Porciani, Cristiano
2010-01-01
We study structure formation in the presence of primordial non-Gaussianity of the local type with parameters f NL and g NL . We show that the distribution of dark-matter halos is naturally described by a multivariate bias scheme where the halo overdensity depends not only on the underlying matter density fluctuation δ but also on the Gaussian part of the primordial gravitational potential φ. This corresponds to a non-local bias scheme in terms of δ only. We derive the coefficients of the bias expansion as a function of the halo mass by applying the peak-background split to common parametrizations for the halo mass function in the non-Gaussian scenario. We then compute the halo power spectrum and halo-matter cross spectrum in the framework of Eulerian perturbation theory up to third order. Comparing our results against N-body simulations, we find that our model accurately describes the numerical data for wave numbers k≤0.1-0.3h Mpc -1 depending on redshift and halo mass. In our multivariate approach, perturbations in the halo counts trace φ on large scales, and this explains why the halo and matter power spectra show different asymptotic trends for k→0. This strongly scale-dependent bias originates from terms at leading order in our expansion. This is different from what happens using the standard univariate local bias where the scale-dependent terms come from badly behaved higher-order corrections. On the other hand, our biasing scheme reduces to the usual local bias on smaller scales, where |φ| is typically much smaller than the density perturbations. We finally discuss the halo bispectrum in the context of multivariate biasing and show that, due to its strong scale and shape dependence, it is a powerful tool for the detection of primordial non-Gaussianity from future galaxy surveys.
Non-linear statistical downscaling of present and LGM precipitation and temperatures over Europe
M. Vrac
2007-12-01
Full Text Available Local-scale climate information is increasingly needed for the study of past, present and future climate changes. In this study we develop a non-linear statistical downscaling method to generate local temperatures and precipitation values from large-scale variables of a Earth System Model of Intermediate Complexity (here CLIMBER. Our statistical downscaling scheme is based on the concept of Generalized Additive Models (GAMs, capturing non-linearities via non-parametric techniques. Our GAMs are calibrated on the present Western Europe climate. For this region, annual GAMs (i.e. models based on 12 monthly values per location are fitted by combining two types of large-scale explanatory variables: geographical (e.g. topographical information and physical (i.e. entirely simulated by the CLIMBER model.
To evaluate the adequacy of the non-linear transfer functions fitted on the present Western European climate, they are applied to different spatial and temporal large-scale conditions. Local projections for present North America and Northern Europe climates are obtained and compared to local observations. This partially addresses the issue of spatial robustness of our transfer functions by answering the question "does our statistical model remain valid when applied to large-scale climate conditions from a region different from the one used for calibration?". To asses their temporal performances, local projections for the Last Glacial Maximum period are derived and compared to local reconstructions and General Circulation Model outputs.
Our downscaling methodology performs adequately for the Western Europe climate. Concerning the spatial and temporal evaluations, it does not behave as well for Northern America and Northern Europe climates because the calibration domain may be too different from the targeted regions. The physical explanatory variables alone are not capable of downscaling realistic values. However, the inclusion of
Stygar, Anna Helena; Krogh, Mogens Agerbo; Kristensen, Troels
2017-01-01
Evolutionary operations is a method to exploit the association of often small changes in process variables, planned during systematic experimentation and occurring during the normal production flow, to production characteristics to find a way to alter the production process to be more efficient....... The objective of this study was to construct a tool to assess the intervention effect on milk production in an evolutionary operations setup. The method used for this purpose was a dynamic linear model (DLM) with Kalman filtering. The DLM consisted of parameters describing milk yield in a herd, individual cows...... bulk tank records. The presented model proved to be a flexible and dynamic tool, and it was successfully applied for systematic experimentation in dairy herds. The model can serve as a decision support tool for on-farm process optimization exploiting planned changes in process variables...
2017-12-08
STATISTICAL LINEAR TIME-VARYING SYSTEM MODEL OF HIGH GRAZING ANGLE SEA CLUTTER FOR COMPUTING INTERFERENCE POWER 1. INTRODUCTION Statistical linear time...beam. We can approximate one of the sinc factors using the Dirichlet kernel to facilitate computation of the integral in (6) as follows: ∣∣∣∣sinc(WB...plotted in Figure 4. The resultant autocorrelation can then be found by substituting (18) into (28). The Python code used to generate Figures 1-4 is found
Lunt, Mark
2015-07-01
In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A statistical theory of cell killing by radiation of varying linear energy transfer
Hawkins, R.B.
1994-01-01
A theory is presented that provides an explanation for the observed features of the survival of cultured cells after exposure to densely ionizing high-linear energy transfer (LET) radiation. It starts from a phenomenological postulate based on the linear-quadratic form of cell survival observed for low-LET radiation and uses principles of statistics and fluctuation theory to demonstrate that the effect of varying LET on cell survival can be attributed to random variation of dose to small volumes contained within the nucleus. A simple relation is presented for surviving fraction of cells after exposure to radiation of varying LET that depends on the α and β parameters for the same cells in the limit of low-LET radiation. This relation implies that the value of β is independent of LET. Agreement of the theory with selected observations of cell survival from the literature is demonstrated. A relation is presented that gives relative biological effectiveness (RBE) as a function of the α and β parameters for low-LET radiation. Measurements from microdosimetry are used to estimate the size of the subnuclear volume to which the fluctuation pertains. 11 refs., 4 figs., 2 tabs
Role of Statistical Random-Effects Linear Models in Personalized Medicine.
Diaz, Francisco J; Yeh, Hung-Wen; de Leon, Jose
2012-03-01
Some empirical studies and recent developments in pharmacokinetic theory suggest that statistical random-effects linear models are valuable tools that allow describing simultaneously patient populations as a whole and patients as individuals. This remarkable characteristic indicates that these models may be useful in the development of personalized medicine, which aims at finding treatment regimes that are appropriate for particular patients, not just appropriate for the average patient. In fact, published developments show that random-effects linear models may provide a solid theoretical framework for drug dosage individualization in chronic diseases. In particular, individualized dosages computed with these models by means of an empirical Bayesian approach may produce better results than dosages computed with some methods routinely used in therapeutic drug monitoring. This is further supported by published empirical and theoretical findings that show that random effects linear models may provide accurate representations of phase III and IV steady-state pharmacokinetic data, and may be useful for dosage computations. These models have applications in the design of clinical algorithms for drug dosage individualization in chronic diseases; in the computation of dose correction factors; computation of the minimum number of blood samples from a patient that are necessary for calculating an optimal individualized drug dosage in therapeutic drug monitoring; measure of the clinical importance of clinical, demographic, environmental or genetic covariates; study of drug-drug interactions in clinical settings; the implementation of computational tools for web-site-based evidence farming; design of pharmacogenomic studies; and in the development of a pharmacological theory of dosage individualization.
Affum, Andrews Obeng; Osae, Shiloh Dede; Nyarko, Benjamin Jabez Botwe; Afful, Samuel; Fianko, Joseph Richmond; Akiti, Tetteh Thomas; Adomako, Dickson; Acquaah, Samuel Osafo; Dorleku, Micheal; Antoh, Emmanuel; Barnes, Felix; Affum, Enoch Acheampong
2015-02-01
In recent times, surface water resource in the Western Region of Ghana has been found to be inadequate in supply and polluted by various anthropogenic activities. As a result of these problems, the demand for groundwater by the human populations in the peri-urban communities for domestic, municipal and irrigation purposes has increased without prior knowledge of its water quality. Water samples were collected from 14 public hand-dug wells during the rainy season in 2013 and investigated for total coliforms, Escherichia coli, mercury (Hg), arsenic (As), cadmium (Cd) and physicochemical parameters. Multivariate statistical analysis of the dataset and a linear stoichiometric plot of major ions were applied to group the water samples and to identify the main factors and sources of contamination. Hierarchal cluster analysis revealed four clusters from the hydrochemical variables (R-mode) and three clusters in the case of water samples (Q-mode) after z score standardization. Principal component analysis after a varimax rotation of the dataset indicated that the four factors extracted explained 93.3 % of the total variance, which highlighted salinity, toxic elements and hardness pollution as the dominant factors affecting groundwater quality. Cation exchange, mineral dissolution and silicate weathering influenced groundwater quality. The ranking order of major ions was Na(+) > Ca(2+) > K(+) > Mg(2+) and Cl(-) > SO4 (2-) > HCO3 (-). Based on piper plot and the hydrogeology of the study area, sodium chloride (86 %), sodium hydrogen carbonate and sodium carbonate (14 %) water types were identified. Although E. coli were absent in the water samples, 36 % of the wells contained total coliforms (Enterobacter species) which exceeded the WHO guidelines limit of zero colony-forming unit (CFU)/100 mL of drinking water. With the exception of Hg, the concentration of As and Cd in 79 and 43 % of the water samples exceeded the WHO guideline limits of 10 and 3
Takabe, Satoshi; Hukushima, Koji
2016-05-01
Typical behavior of the linear programming (LP) problem is studied as a relaxation of the minimum vertex cover (min-VC), a type of integer programming (IP) problem. A lattice-gas model on the Erdös-Rényi random graphs of α-uniform hyperedges is proposed to express both the LP and IP problems of the min-VC in the common statistical mechanical model with a one-parameter family. Statistical mechanical analyses reveal for α=2 that the LP optimal solution is typically equal to that given by the IP below the critical average degree c=e in the thermodynamic limit. The critical threshold for good accuracy of the relaxation extends the mathematical result c=1 and coincides with the replica symmetry-breaking threshold of the IP. The LP relaxation for the minimum hitting sets with α≥3, minimum vertex covers on α-uniform random graphs, is also studied. Analytic and numerical results strongly suggest that the LP relaxation fails to estimate optimal values above the critical average degree c=e/(α-1) where the replica symmetry is broken.
Takabe, Satoshi; Hukushima, Koji
2016-05-01
Typical behavior of the linear programming (LP) problem is studied as a relaxation of the minimum vertex cover (min-VC), a type of integer programming (IP) problem. A lattice-gas model on the Erdös-Rényi random graphs of α -uniform hyperedges is proposed to express both the LP and IP problems of the min-VC in the common statistical mechanical model with a one-parameter family. Statistical mechanical analyses reveal for α =2 that the LP optimal solution is typically equal to that given by the IP below the critical average degree c =e in the thermodynamic limit. The critical threshold for good accuracy of the relaxation extends the mathematical result c =1 and coincides with the replica symmetry-breaking threshold of the IP. The LP relaxation for the minimum hitting sets with α ≥3 , minimum vertex covers on α -uniform random graphs, is also studied. Analytic and numerical results strongly suggest that the LP relaxation fails to estimate optimal values above the critical average degree c =e /(α -1 ) where the replica symmetry is broken.
Herojeet, Rajkumar; Rishi, Madhuri S.; Lata, Renu; Dolma, Konchok
2017-09-01
Sirsa River flows through the central part of the Nalagarh valley, belongs to the rapid industrial belt of Baddi, Barotiwala and Nalagarh (BBN). The appraisal of surface water quality to ascertain its utility in such ecologically sensitive areas is need of the hour. The present study envisages the application of multivariate analysis, water utility class and conventional graphical representation to reveal the hidden factor responsible for deterioration of water quality and determine the hydrochemical facies and its evolution processes of water types in Nalagarh valley, India. The quality assessment is made by estimating pH, electrical conductivity (EC), total dissolved solids (TDS), total hardness, major ions (Na+, K+, Ca2+, Mg2+, HCO3 -, Cl-, SO4 2-, NO3 - and PO4 3-), dissolved oxygen (DO), biological oxygen demand (BOD) and total coliform (TC) to determine its suitability for drinking and domestic purposes. The parameters like pH, TDS, TH, Ca2+, HCO3 -, Cl-, SO4 2-, NO3 - are within the desirable limit as per Bureau of Indian Standards (Indian Standard Drinking Water Specification (Second Edition) IS:10500. Indian Standard Institute, New Delhi, pp 1-18, 2012). Mg2+, Na+ and K+ ions for pre monsoon and EC during pre and post monsoon at few sites and approx 40% samples of BOD and TC for both seasons exceeds the permissible limits indicate organic contamination from human activities. Water quality classification for designated use indicates that maximum surface water samples are not suitable for drinking water source without conventional treatment. The result of piper trillinear and Chadha's diagram classified majority of surface water samples for both seasons fall in the fields of Ca2+-Mg2+-HCO3 - water type indicating temporary hardness. PCA and CA reveal that the surface water chemistry is influenced by natural factors such as weathering of minerals, ion exchange processes and anthropogenic factors. Thus, the present paper illustrates the importance of
Seayed Jaber Yousefi
2012-04-01
Full Text Available The study area is located in southeastern Iran, about 110 km southwest of Kerman. Geologically, the area is composed of ophiolitic rocks, volcanic rocks, intrusive bodies and sedimentary rocks. Vein mineralization within andesite, andesitic basalt, andesitic tuffs occurred along the Chahar Gonbad fault. Sulfide mineralization in the ore deposit has taken place as dissemination, veins and veinlets in which pyrite and chalcopyrite are the most important ores. In this area, argillic, phyllic and propylitic alteration types are observed. Such elements as Au, Bi, Cu, S and Se are more enriched than others and the enrichment factors for these elements in comparison with background concentration are 321, 503, 393, 703 and 208, and with respect to Clark concentration are 401, 222, 532, 101 and 156, respectively. According to multivariate analysis, three major mineralization phases are recognized in the deposit. During the first phase, hydrothermal calcite veins are enriched in As, Cd, Pb, Zn and Ca, the second phase is manifested by the enrichment of sulfide veins in Cu, Au, Ag, Bi, Fe and S and the third phase mineralization includes Ni, Mn, Se and Sb as an intermediate level between the two previous phases.
Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung
2016-01-01
A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Maxon, C.L.; Barnett, A.M.; Diener, D.R.
1997-01-01
This study attempts to predict biological toxicity and benthic community impact in sediments collected from two southern California sites. Contaminant concentrations and grain size were evaluated as predictors using a two-step multivariate approach. The first step used principal component analysis (PCA) to describe contamination type and magnitude present at each site. Four dominant PC vectors, explaining 88% of the total variance, each corresponded to a unique physical and/or chemical signature. The four PC vectors, in decreasing order of importance, were: (1) high molecular weight polynuclear aromatic hydrocarbons (PAH), most likely from combusted or weathered petroleum; (2) low molecular weight alkylated PAH, primarily from weathered fuel product; (3) low molecular weight nonalkylated PAH, indicating a fresh petroleum-related origin; and (4) fine-grained sediments and metals. The second step used stepwise regression analysis to predict individual biological effects (dependent) variables using the four PC vectors as independent variables. Results showed that sediment grain size alone was the best predictor of amphipod mortality. Contaminant vectors showed discrete depositional areas independent of grain size. Neither contaminant concentrations nor PCA vectors were good predictors of biological effects, most likely due to the low concentrations in sediments
Shashank Vyas
2016-01-01
Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.
Chang, Pao-Erh Paul; Yang, Jen-Chih Rena; Den, Walter; Wu, Chang-Fu
2014-09-01
Emissions of volatile organic compounds (VOCs) are most frequent environmental nuisance complaints in urban areas, especially where industrial districts are nearby. Unfortunately, identifying the responsible emission sources of VOCs is essentially a difficult task. In this study, we proposed a dynamic approach to gradually confine the location of potential VOC emission sources in an industrial complex, by combining multi-path open-path Fourier transform infrared spectrometry (OP-FTIR) measurement and the statistical method of principal component analysis (PCA). Close-cell FTIR was further used to verify the VOC emission source by measuring emitted VOCs from selected exhaust stacks at factories in the confined areas. Multiple open-path monitoring lines were deployed during a 3-month monitoring campaign in a complex industrial district. The emission patterns were identified and locations of emissions were confined by the wind data collected simultaneously. N,N-Dimethyl formamide (DMF), 2-butanone, toluene, and ethyl acetate with mean concentrations of 80.0 ± 1.8, 34.5 ± 0.8, 103.7 ± 2.8, and 26.6 ± 0.7 ppbv, respectively, were identified as the major VOC mixture at all times of the day around the receptor site. As the toxic air pollutant, the concentrations of DMF in air samples were found exceeding the ambient standard despite the path-average effect of OP-FTIR upon concentration levels. The PCA data identified three major emission sources, including PU coating, chemical packaging, and lithographic printing industries. Applying instrumental measurement and statistical modeling, this study has established a systematic approach for locating emission sources. Statistical modeling (PCA) plays an important role in reducing dimensionality of a large measured dataset and identifying underlying emission sources. Instrumental measurement, however, helps verify the outcomes of the statistical modeling. The field study has demonstrated the feasibility of
Alves, Luana F.N.; Sarkis, Jorge E.S.; Bordon, Isabela C.A.C.
2015-01-01
Analysis of industrial lubricants is widely used for monitoring and predicting maintenance requirements in a broad range of mechanical systems. Laser induced breakdown spectroscopy has been used to evaluate the potentiality of the technique for the determination of metals in lubricating oils. Prior to quantitative analysis, the LIBS system was calibrated using standard samples containing the elements investigated (Cu, Cr, Fe, Pb, Mo and Mg). This study presents the usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets in order to get more information about concentration of metals in oils lubricants is related to engine wear. (author)
Bakraji, E. H.
2007-01-01
Radioisotopic x-ray fluorescence (XRF) analysis has been utilized to determine the elemental composition of 55 archaeological pottery samples by the determination of 17 chemical elements. Fifty-four of them came from the Tel-Alramad Site in Katana town, near Damascus city, Syria, and one sample came from Brazil. The XRF results have been processed using two multivariate statistical methods, cluster and factor analysis, in order to determine similarities and correlation between the selected samples based on their elemental composition. The methodology successfully separates the samples where four distinct chemical groups were identified. (author)
Statistical distributions of earthquakes and related non-linear features in seismic waves
Apostol, B.-F.
2006-01-01
A few basic facts in the science of the earthquakes are briefly reviewed. An accumulation, or growth, model is put forward for the focal mechanisms and the critical focal zone of the earthquakes, which relates the earthquake average recurrence time to the released seismic energy. The temporal statistical distribution for average recurrence time is introduced for earthquakes, and, on this basis, the Omori-type distribution in energy is derived, as well as the distribution in magnitude, by making use of the semi-empirical Gutenberg-Richter law relating seismic energy to earthquake magnitude. On geometric grounds, the accumulation model suggests the value r = 1/3 for the Omori parameter in the power-law of energy distribution, which leads to β = 1,17 for the coefficient in the Gutenberg-Richter recurrence law, in fair agreement with the statistical analysis of the empirical data. Making use of this value, the empirical Bath's law is discussed for the average magnitude of the aftershocks (which is 1.2 less than the magnitude of the main seismic shock), by assuming that the aftershocks are relaxation events of the seismic zone. The time distribution of the earthquakes with a fixed average recurrence time is also derived, the earthquake occurrence prediction is discussed by means of the average recurrence time and the seismicity rate, and application of this discussion to the seismic region Vrancea, Romania, is outlined. Finally, a special effect of non-linear behaviour of the seismic waves is discussed, by describing an exact solution derived recently for the elastic waves equation with cubic anharmonicities, its relevance, and its connection to the approximate quasi-plane waves picture. The properties of the seismic activity accompanying a main seismic shock, both like foreshocks and aftershocks, are relegated to forthcoming publications. (author)
Statistical properties of the linear σ model used in dynamical simulations of DCC formation
Randrup, J.
1997-01-01
The present work develops a simple approximate framework for initializing and interpreting dynamical simulations with the linear σ model exploring the formation of disoriented chiral condensates in high-energy collisions. By enclosing the system in a rectangular box with periodic boundary conditions, it is possible to decompose uniquely the chiral field into its spatial average (the order parameter) and its fluctuations (the quasiparticles) which can be treated in the Hartree approximation. The quasiparticle modes are then described approximately by Klein-Gordon dispersion relations containing an effective mass depending on both the temperature and the magnitude of the order parameter; their fluctuations are instrumental in shaping the effective potential governing the order parameter, and the emerging statistical description is thermodynamicially consistent. The temperature dependence of the statistical distribution of the order parameter is discussed, as is the behavior of the associated effective masses; as the system is cooled, the field fluctuations subside, causing a smooth change from the high-temperature phase in which chiral symmetry is approximately restored towards the normal phase. Of practical interest is the fact that the equilibrium field configurations can be sampled in a simple manner, thus providing a convenient means for specifying the initial conditions in dynamical simulations of the nonequilibrium relaxation of the chiral field; in particular, the correlation function is much more realistic than those emerging in previous initialization methods. It is illustrated how such samples remain approximately invariant under propagation by the unapproximated equation of motion over times that are long on the scale of interest, thereby suggesting that the treatment is sufficiently accurate to be of practical utility. copyright 1997 The American Physical Society
Zhang, Kai; Shardt, Yuri A W; Chen, Zhiwen; Peng, Kaixiang
2017-03-01
Using the expected detection delay (EDD) index to measure the performance of multivariate statistical process monitoring (MSPM) methods for constant additive faults have been recently developed. This paper, based on a statistical investigation of the T 2 - and Q-test statistics, extends the EDD index to the multiplicative and drift fault cases. As well, it is used to assess the performance of common MSPM methods that adopt these two test statistics. Based on how to use the measurement space, these methods can be divided into two groups, those which consider the complete measurement space, for example, principal component analysis-based methods, and those which only consider some subspace that reflects changes in key performance indicators, such as partial least squares-based methods. Furthermore, a generic form for them to use T 2 - and Q-test statistics are given. With the extended EDD index, the performance of these methods to detect drift and multiplicative faults is assessed using both numerical simulations and the Tennessee Eastman process. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Downlink Linear Precoders Based on Statistical CSI for Multicell MIMO-OFDM
Ebrahim Baktash
2017-01-01
Full Text Available With 5G communication systems on the horizon, efficient interference management in heterogeneous multicell networks is more vital than ever. This paper investigates the linear precoder design for downlink multicell multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, where base stations (BSs coordinate to reduce the interference across space and frequency. In order to minimize the overall feedback overhead in next-generation systems, we consider precoding schemes that require statistical channel state information (CSI only. We apply the random matrix theory to approximate the ergodic weighted sum rate of the system with a closed form expression. After formulating the approximation for general channels, we reduce the results to a more compact form using the Kronecker channel model for which several multicarrier concepts such as frequency selectivity, channel tap correlations, and intercarrier interference (ICI are rigorously represented. We find the local optimal solution for the maximization of the approximate rate using a gradient method that requires only the covariance structure of the MIMO-OFDM channels. Within this covariance structure are the channel tap correlations and ICI information, both of which are taken into consideration in the precoder design. Simulation results show that the rate approximation is very accurate even for very small MIMO-OFDM systems and the proposed method converges rapidly to a near-optimal solution that competes with networked MIMO and precoders based on instantaneous full CSI.
Development of statistical linear regression model for metals from transportation land uses.
Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung
2009-01-01
The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.
El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz
2017-01-01
Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock–water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. - Highlights: • Hydrochemical investigations were carried out in Dhurma aquifer in Saudi Arabia. • The factors controlling potential groundwater pollution in an arid region were studied. • Chemical and statistical analyses are integrated to assess these factors. • Five main factors were extracted, which explain >77% of the total data variance. • The chemical characteristics of the groundwater were influenced by rock–water interactions
Liu, Ya-Juan; André, Silvère; Saint Cristau, Lydia; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Devos, Olivier; Duponchel, Ludovic
2017-02-01
Multivariate statistical process control (MSPC) is increasingly popular as the challenge provided by large multivariate datasets from analytical instruments such as Raman spectroscopy for the monitoring of complex cell cultures in the biopharmaceutical industry. However, Raman spectroscopy for in-line monitoring often produces unsynchronized data sets, resulting in time-varying batches. Moreover, unsynchronized data sets are common for cell culture monitoring because spectroscopic measurements are generally recorded in an alternate way, with more than one optical probe parallelly connecting to the same spectrometer. Synchronized batches are prerequisite for the application of multivariate analysis such as multi-way principal component analysis (MPCA) for the MSPC monitoring. Correlation optimized warping (COW) is a popular method for data alignment with satisfactory performance; however, it has never been applied to synchronize acquisition time of spectroscopic datasets in MSPC application before. In this paper we propose, for the first time, to use the method of COW to synchronize batches with varying durations analyzed with Raman spectroscopy. In a second step, we developed MPCA models at different time intervals based on the normal operation condition (NOC) batches synchronized by COW. New batches are finally projected considering the corresponding MPCA model. We monitored the evolution of the batches using two multivariate control charts based on Hotelling's T 2 and Q. As illustrated with results, the MSPC model was able to identify abnormal operation condition including contaminated batches which is of prime importance in cell culture monitoring We proved that Raman-based MSPC monitoring can be used to diagnose batches deviating from the normal condition, with higher efficacy than traditional diagnosis, which would save time and money in the biopharmaceutical industry. Copyright © 2016 Elsevier B.V. All rights reserved.
A. Mrutu
2013-12-01
Full Text Available Mangrove wetlands are important biological systems that usually filter out organic and inorganic contaminants from the wastewaters before entering the ocean. Our previous work showed that sediments of the Msimbazi Creek wetland are contaminated with heavy metals and the amounts decreased with increasing depth. However, the hidden relationships between the heavy metals and clay particles were not fully understood based on the numerical data. Therefore this work used the data from literature and the Statistical Package for Social Sciences (SPSS software to study how significant the relationships are and predict the sources of heavy metals and clays. The results showed that Cd is the only metal that showed insignificant correlations with other heavy metals (with Pb and Zn while the rest of heavy metals exhibited significant positive correlation (except Pb vs. Ni. Cluster analysis classified the heavy metals based on the concentration and the first 50 cm cores (0-50 cm had higher heavy metals and % clay than the second 50 cm cores (51-100 cm. The results from the factor analysis suggests that Pb, Cd, Ni, and clay owe their source mostly from anthropogenic activities while Fe, Co, Cr, Zn and sand come from both anthropogenic and natural sources. These results support our previous suggestions that heavy metals and clays found in this wetland have mostly anthropogenic origin. However, we recommend isotopic tracing studies in order to accurately identify the origins of the heavy metals and clays in sediments of Msimbazi Creek mangrove wetland.
El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz
2017-10-01
Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock-water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. Copyright © 2017 Elsevier Ltd. All rights reserved.
Crucifix, Michel; Wilkinson, Richard; Carson, Jake; Preston, Simon; Alemeida, Carlos; Rougier, Jonathan
2013-04-01
The existence of an action of astronomical forcing on the Pleistocene climate is almost undisputed. However, quantifying this action is not straightforward. In particular, the phenomenon of deglaciation is generally interpreted as a manifestation of instability, which is typical of non-linear systems. As a consequence, explaining the Pleistocene climate record as the addition of an astronomical contribution and noise-as often done using harmonic analysis tools-is potentially deceptive. Rather, we advocate a methodology in which non-linear stochastic dynamical systems are calibrated on the Pleistocene climate record. The exercise, though, requires careful statistical reasoning and state-of-the-art techniques. In fact, the problem has been judged to be mathematically 'intractable and unsolved' and some pragmatism is justified. In order to illustrate the methodology we consider one dynamical system that potentially captures four dynamical features of the Pleistocene climate : the existence of a saddle-node bifurcation in at least one of its slow components, a time-scale separation between a slow and a fast component, the action of astronomical forcing, and the existence a stochastic contribution to the system dynamics. This model is obviously not the only possible representation of Pleistocene dynamics, but it encapsulates well enough both our theoretical and empirical knowledge into a very simple form to constitute a valid starting point. The purpose of this poster is to outline the practical challenges in calibrating such a model on paleoclimate observations. Just as in time series analysis, there is no one single and universal test or criteria that would demonstrate the validity of an approach. Several methods exist to calibrate the model and judgement develops by the confrontation of the results of the different methods. In particular, we consider here the Kalman filter variants, the Particle Monte-Carlo Markov Chain, and two other variants of Sequential Monte
Kauer, Agnes; Dorigo, Wouter; Bauer-Marschallinger, Bernhard
2017-04-01
Global warming is expected to change ocean-atmosphere oscillation patterns, e.g. the El Nino Southern Oscillation, and may thus have a substantial impact on water resources over land. Yet, the link between climate oscillations and terrestrial hydrology has large uncertainties. In particular, the climate in the Mediterranean basin is expected to be sensitive to global warming as it may increase insufficient and irregular water supply and lead to more frequent and intense droughts and heavy precipitation events. The ever increasing need for water in tourism and agriculture reinforce the problem. Therefore, the monitoring and better understanding of the hydrological cycle are crucial for this area. This study seeks to quantify the effect of regional climate modes, e.g. the Northern Atlantic Oscillation (NAO) on the hydrological cycle in the Mediterranean. We apply Empirical Orthogonal Functions (EOF) to a wide range of hydrological datasets to extract the major modes of variation over the study period. We use more than ten datasets describing precipitation, soil moisture, evapotranspiration, and changes in water mass with study periods ranging from one to three decades depending on the dataset. The resulting EOFs are then examined for correlations with regional climate modes using Spearman rank correlation analysis. This is done for the entire time span of the EOFs and for monthly and seasonally sampled data. We find relationships between the hydrological datasets and the climate modes NAO, Arctic Oscillation (AO), Eastern Atlantic (EA), and Tropical Northern Atlantic (TNA). Analyses of monthly and seasonally sampled data reveal high correlations especially in the winter months. However, the spatial extent of the data cube considered for the analyses have a large impact on the results. Our statistical analyses suggest an impact of regional climate modes on the hydrological cycle in the Mediterranean area and may provide valuable input for evaluating process
Lee, Celine Siu-lan; Li Xiangdong; Shi Wenzhong; Cheung, Sharon Ching-nga; Thornton, Iain
2006-01-01
The urban environment quality is of vital importance as the majority of people now live in cities. Due to the continuous urbanisation and industrialisation in many parts of the world, metals are continuously emitted into the terrestrial environment and pose a great threat on human health. An extensive survey was conducted in the highly urbanised and commercialised Hong Kong Island area (80.3 km 2 ) of Hong Kong using a systematic sampling strategy of five soil samples per km 2 in urban areas and two samples per km 2 in the suburban and country park sites (0-15 cm). The analytical results indicated that the surface soils in urban and suburban areas are enriched with metals, such as Cu, Pb, and Zn. The Pb concentration in the urban soils was found to exceed the Dutch target value. The statistical analyses using principal component analysis (PCA) and cluster analysis (CA) showed distinctly different associations among trace metals and the major elements (Al, Ca, Fe, Mg, Mn) in the urban, suburban, and country park soils. Soil pollution maps of trace metals (Cd, Co, Cr, Cu, Ni, Pb, and Zn) in the surface soils were produced based on geographical information system (GIS) technology. The hot-spot areas of metal contamination were mainly concentrated in the northern and western parts of Hong Kong Island, and closely related to high traffic conditions. The Pb isotopic composition of the urban, suburban, and country park soils showed that vehicular emissions were the major anthropogenic sources for Pb. The 206 Pb/ 207 Pb and 208 Pb/ 207 Pb ratios in soils decreased as Pb concentrations increased in a polynomial line (degree = 2)
Riley, P.; Richardson, I. G.
2012-01-01
In-situ measurements of interplanetary coronal mass ejections (ICMEs) display a wide range of properties. A distinct subset, "magnetic clouds" (MCs), are readily identifiable by a smooth rotation in an enhanced magnetic field, together with an unusually low solar wind proton temperature. In this study, we analyze Ulysses spacecraft measurements to systematically investigate five possible explanations for why some ICMEs are observed to be MCs and others are not: i) An observational selection effect; that is, all ICMEs do in fact contain MCs, but the trajectory of the spacecraft through the ICME determines whether the MC is actually encountered; ii) interactions of an erupting flux rope (PR) with itself or between neighboring FRs, which produce complex structures in which the coherent magnetic structure has been destroyed; iii) an evolutionary process, such as relaxation to a low plasma-beta state that leads to the formation of an MC; iv) the existence of two (or more) intrinsic initiation mechanisms, some of which produce MCs and some that do not; or v) MCs are just an easily identifiable limit in an otherwise corntinuous spectrum of structures. We apply quantitative statistical models to assess these ideas. In particular, we use the Akaike information criterion (AIC) to rank the candidate models and a Gaussian mixture model (GMM) to uncover any intrinsic clustering of the data. Using a logistic regression, we find that plasma-beta, CME width, and the ratio O(sup 7) / O(sup 6) are the most significant predictor variables for the presence of an MC. Moreover, the propensity for an event to be identified as an MC decreases with heliocentric distance. These results tend to refute ideas ii) and iii). GMM clustering analysis further identifies three distinct groups of ICMEs; two of which match (at the 86% level) with events independently identified as MCs, and a third that matches with non-MCs (68 % overlap), Thus, idea v) is not supported. Choosing between ideas i) and
Arveti Nagaraju
2016-10-01
Full Text Available Hydrogeochemical studies were carried out in and around Udayagiri area of Andhra Pradesh in order to assess the chemistry of the groundwater and to identify the dominant hydrogeochemical processes and mechanisms responsible for the evolution of the chemical composition of the groundwater. Descriptive statistics, correlation matrices, principal component analysis (PCA, together with cluster analysis (CA were used to gain an understanding of the hydrogeochemical processes in the study area. PCA has identified 4 main processes influencing the groundwater chemistry viz., mineral precipitation and dissolution, seawater intrusion, cation exchange, and carbonate balance. Further, three clusters C1, C2 and C3 were obtained. Samples from C1 contain high level of Cl− and may be due to the intensive evaporation and contamination from landfill leachate. Most of the samples from C2 are located closer to the sea and the high level of Na+ +K+ in these samples may be attributed to seawater intrusion. The geochemistry of water samples in C3 are more likely to originate from rock weathering. This has been supported by Gibbs diagram. The groundwater geochemistry in the study area is mostly of natural origin, but is influenced to some degree by human activity. Evaluación de la calidad del agua subterránea a través de técnicas estadísticas multivariadas en el área Udayagiri, distrito Nellore, Andhra Pradesh, en el sur de India Resumen Se realizaron estudios hidrogeoquímicos en y alrededor del área Udayagiri de Andhra Pradesh para evaluar la química del agua subterránea e identificar los procesos hidrogeoquímicos dominantes y los mecanismos responsables de la evolución en la composición química del agua subterránea. Se utilizaron estadísticas descriptivas, matrices de correlación, análisis de componentes principales, al igual que análisis de grupos, para obtener y entender los procesos hidrogeoquímicos en el área de estudio. Los an
Whist, A C; Liland, K H; Jonsson, M E; Sæbø, S; Sviland, S; Østerås, O; Norström, M; Hopp, P
2014-11-01
Surveillance programs for animal diseases are critical to early disease detection and risk estimation and to documenting a population's disease status at a given time. The aim of this study was to describe a risk-based surveillance program for detecting Mycobacterium avium ssp. paratuberculosis (MAP) infection in Norwegian dairy cattle. The included risk factors for detecting MAP were purchase of cattle, combined cattle and goat farming, and location of the cattle farm in counties containing goats with MAP. The risk indicators included production data [culling of animals >3 yr of age, carcass conformation of animals >3 yr of age, milk production decrease in older lactating cows (lactations 3, 4, and 5)], and clinical data (diarrhea, enteritis, or both, in animals >3 yr of age). Except for combined cattle and goat farming and cattle farm location, all data were collected at the cow level and summarized at the herd level. Predefined risk factors and risk indicators were extracted from different national databases and combined in a multivariate statistical process control to obtain a risk assessment for each herd. The ordinary Hotelling's T(2) statistic was applied as a multivariate, standardized measure of difference between the current observed state and the average state of the risk factors for a given herd. To make the analysis more robust and adapt it to the slowly developing nature of MAP, monthly risk calculations were based on data accumulated during a 24-mo period. Monitoring of these variables was performed to identify outliers that may indicate deviance in one or more of the underlying processes. The highest-ranked herds were scattered all over Norway and clustered in high-density dairy cattle farm areas. The resulting rankings of herds are being used in the national surveillance program for MAP in 2014 to increase the sensitivity of the ongoing surveillance program in which 5 fecal samples for bacteriological examination are collected from 25 dairy herds
Weighted A-statistical convergence for sequences of positive linear operators.
Mohiuddine, S A; Alotaibi, Abdullah; Hazarika, Bipan
2014-01-01
We introduce the notion of weighted A-statistical convergence of a sequence, where A represents the nonnegative regular matrix. We also prove the Korovkin approximation theorem by using the notion of weighted A-statistical convergence. Further, we give a rate of weighted A-statistical convergence and apply the classical Bernstein polynomial to construct an illustrative example in support of our result.
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for
Molinaroli, E.; Pistolato, M.; Rampazzo, G.; Guerzoni, S.
1999-01-01
The chemical characteristics of the mineral fractions of aerosol and precipitation collected in Sardinia (NW Mediterranean) are highlighted by means of two multivariate statistical approaches. Two different combinations of classification and statistical methods for geochemical data are presented. It is shown that the application of cluster analysis subsequent to Q-Factor analysis better distinguishes among Saharan dust, background pollution (Europe-Mediterranean) and local aerosol from various source regions (Sardinia). Conversely, the application of simple cluster analysis was able to distinguish only between aerosols and precipitation particles, without assigning the sources (local or distant) to the aerosol. This method also highlighted the fact that crust-enriched precipitation is similar to desert-derived aerosol. Major elements (Al, Na) and trace metal (Pb) turn out to be the most discriminating elements of the analysed data set. Independent use of mineralogical, granulometric and meteorological data confirmed the results derived from the statistical methods employed. (Copyright (c) 1999 Elsevier Science B.V., Amsterdam. All rights reserved.)
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures). PMID:24737994
Fernando Velasco-Tapia
2014-01-01
Full Text Available Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC volcanic range (Mexican Volcanic Belt. In this locality, the volcanic activity (3.7 to 0.5 Ma was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward’s linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas in the comingled lavas (binary mixtures.
Mihajilov-Krstev, Tatjana M; Denić, Marija S; Zlatković, Bojan K; Stankov-Jovanović, Vesna P; Mitić, Violeta D; Stojanović, Gordana S; Radulović, Niko S
2015-04-01
In Serbia, delicatessen fruit alcoholic drinks are produced from autochthonous fruit-bearing species such as cornelian cherry, blackberry, elderberry, wild strawberry, European wild apple, European blueberry and blackthorn fruits. There are no chemical data on many of these and herein we analysed volatile minor constituents of these rare fruit distillates. Our second goal was to determine possible chemical markers of these distillates through a statistical/multivariate treatment of the herein obtained and previously reported data. Detailed chemical analyses revealed a complex volatile profile of all studied fruit distillates with 371 identified compounds. A number of constituents were recognised as marker compounds for a particular distillate. Moreover, 33 of them represent newly detected flavour constituents in alcoholic beverages or, in general, in foodstuffs. With the aid of multivariate analyses, these volatile profiles were successfully exploited to infer the origin of raw materials used in the production of these spirits. It was also shown that all fruit distillates possessed weak antimicrobial properties. It seems that the aroma of these highly esteemed wild-fruit spirits depends on the subtle balance of various minor volatile compounds, whereby some of them are specific to a certain type of fruit distillate and enable their mutual distinction. © 2014 Society of Chemical Industry.
Rasouli, Zolaikha; Ghavami, Raouf
2016-08-01
Vanillin (VA), vanillic acid (VAI) and syringaldehyde (SIA) are important food additives as flavor enhancers. The current study for the first time is devote to the application of partial least square (PLS-1), partial robust M-regression (PRM) and feed forward neural networks (FFNNs) as linear and nonlinear chemometric methods for the simultaneous detection of binary and ternary mixtures of VA, VAI and SIA using data extracted directly from UV-spectra with overlapped peaks of individual analytes. Under the optimum experimental conditions, for each compound a linear calibration was obtained in the concentration range of 0.61-20.99 [LOD = 0.12], 0.67-23.19 [LOD = 0.13] and 0.73-25.12 [LOD = 0.15] μg mL- 1 for VA, VAI and SIA, respectively. Four calibration sets of standard samples were designed by combination of a full and fractional factorial designs with the use of the seven and three levels for each factor for binary and ternary mixtures, respectively. The results of this study reveal that both the methods of PLS-1 and PRM are similar in terms of predict ability each binary mixtures. The resolution of ternary mixture has been accomplished by FFNNs. Multivariate curve resolution-alternating least squares (MCR-ALS) was applied for the description of spectra from the acid-base titration systems each individual compound, i.e. the resolution of the complex overlapping spectra as well as to interpret the extracted spectral and concentration profiles of any pure chemical species identified. Evolving factor analysis (EFA) and singular value decomposition (SVD) were used to distinguish the number of chemical species. Subsequently, their corresponding dissociation constants were derived. Finally, FFNNs has been used to detection active compounds in real and spiked water samples.
On Linear Hulls, Statistical Saturation Attacks, PRESENT and a Cryptanalysis of PUFFIN
Leander, Gregor
2011-01-01
which breaks the cipher for at least a quarter of the keys with a complexity less than 258. In the case of PRESENT we show that the design is sound. The design criteria are sufficient to ensure the resistance against linear attacks, taking into account the notion of linear hulls. Finally, we show...
Robel, Martin; Kristo, Michael J
2008-11-01
The problem of identifying the provenance of unknown nuclear material in the environment by multivariate statistical analysis of its uranium and/or plutonium isotopic composition is considered. Such material can be introduced into the environment as a result of nuclear accidents, inadvertent processing losses, illegal dumping of waste, or deliberate trafficking in nuclear materials. Various combinations of reactor type and fuel composition were analyzed using Principal Components Analysis (PCA) and Partial Least Squares Discriminant Analysis (PLSDA) of the concentrations of nine U and Pu isotopes in fuel as a function of burnup. Real-world variation in the concentrations of (234)U and (236)U in the fresh (unirradiated) fuel was incorporated. The U and Pu were also analyzed separately, with results that suggest that, even after reprocessing or environmental fractionation, Pu isotopes can be used to determine both the source reactor type and the initial fuel composition with good discrimination.
Cardot, J-M; Roudier, B; Schütz, H
2017-07-01
The f 2 test is generally used for comparing dissolution profiles. In cases of high variability, the f 2 test is not applicable, and the Multivariate Statistical Distance (MSD) test is frequently proposed as an alternative by the FDA and EMA. The guidelines provide only general recommendations. MSD tests can be performed either on raw data with or without time as a variable or on parameters of models. In addition, data can be limited-as in the case of the f 2 test-to dissolutions of up to 85% or to all available data. In the context of the present paper, the recommended calculation included all raw dissolution data up to the first point greater than 85% as a variable-without the various times as parameters. The proposed MSD overcomes several drawbacks found in other methods.
Aniela Balacescu; Marian Zaharia
2011-01-01
This paper aims to examine the causal relationship between GDP and final consumption. The authors used linear regression model in which GDP is considered variable results, and final consumption variable factor. In drafting article we used Excel software application that is a modern computing and statistical data analysis.
Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep
2015-05-01
The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.
Matiatos, Ioannis
2016-01-15
Nitrate (NO3) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ(15)N-NO3 and δ(18)O-NO3) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO3 sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC).
Matiatos, Ioannis
2016-01-01
Nitrate (NO_3) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ"1"5N–NO_3 and δ"1"8O–NO_3) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO_3 sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC). - Highlights: • More enriched N-isotope values were observed in the industrial/urban areas. • A Bayesian isotope mixing model was applied in a multiple land-use area. • A 3-component model explained the factors controlling nitrate content in groundwater. • Industrial/urban nitrogen source was
Kakuda, Hiroyuki; Okada, Tetsuo; Otsuka, Makoto; Katsumoto, Yukiteru; Hasegawa, Takeshi
2009-01-01
A multivariate analytical technique has been applied to the analysis of simultaneous measurement data from differential scanning calorimetry (DSC) and X-ray diffraction (XRD) in order to study thermal changes in crystalline structure of a linear poly(ethylene imine) (LPEI) film. A large number of XRD patterns generated from the simultaneous measurements were subjected to an augmented alternative least-squares (ALS) regression analysis, and the XRD patterns were readily decomposed into chemically independent XRD patterns and their thermal profiles were also obtained at the same time. The decomposed XRD patterns and the profiles were useful in discussing the minute peaks in the DSC. The analytical results revealed the following changes of polymorphisms in detail: An LPEI film prepared by casting an aqueous solution was composed of sesquihydrate and hemihydrate crystals. The sesquihydrate one was lost at an early stage of heating, and the film changed into an amorphous state. Once the sesquihydrate was lost by heating, it was not recovered even when it was cooled back to room temperature. When the sample was heated again, structural changes were found between the hemihydrate and the amorphous components. In this manner, the simultaneous DSC-XRD measurements combined with ALS analysis proved to be powerful for obtaining a better understanding of the thermally induced changes of the crystalline structure in a polymer film.
Bangdiwala, S I; Anzola-Pérez, E
1990-03-01
Injuries and accidents are acknowledged as leading causes of morbidity and mortality among children and adolescents in the developing countries of the world. The Pan American Health Organization sponsored a collaborative study in four selected countries in Latin America to study the extent of the problem as well as to examine the potential risk factors associated with selected non-fatal injuries in the countries. The study subjects were injured children and adolescents (0-19 years of age) presenting at the study hospitals in chosen urban centres, as well as injured that were surveyed in households in the catchment areas of the hospitals. Study methods and descriptive frequency results were presented earlier. In this paper, log-linear multivariate regression models are used to examine the potentiating effects within country of several measured variables on specific types of injuries. The significance of risk factors varied between countries; however, some general patterns emerged. Falls were more likely in younger children, and occurred at home. The main risk factor for home accidents was the age of the child. The education of the head of the household was an important risk factor for the type of injury suffered. The likelihood of traffic accident injury varied with time of day and day of the week, but also was more likely in higher educated households. The results found are consistent with those found in other studies in the developed world and suggest specific areas of concern for health planners to address.
Weighted A-Statistical Convergence for Sequences of Positive Linear Operators
S. A. Mohiuddine
2014-01-01
Full Text Available We introduce the notion of weighted A-statistical convergence of a sequence, where A represents the nonnegative regular matrix. We also prove the Korovkin approximation theorem by using the notion of weighted A-statistical convergence. Further, we give a rate of weighted A-statistical convergence and apply the classical Bernstein polynomial to construct an illustrative example in support of our result.
Sommer, Stefan Horst; Lauze, Francois Bernard; Hauberg, Søren
2010-01-01
, we present a comparison between the non-linear analog of Principal Component Analysis, Principal Geodesic Analysis, in its linearized form and its exact counterpart that uses true intrinsic distances. We give examples of datasets for which the linearized version provides good approximations...... and for which it does not. Indicators for the differences between the two versions are then developed and applied to two examples of manifold valued data: outlines of vertebrae from a study of vertebral fractures and spacial coordinates of human skeleton end-effectors acquired using a stereo camera and tracking...
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
J Olive, David
2017-01-01
This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given. The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory. The robust techniques are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis. A simple way to bootstrap confidence regions is also provided. Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with...
Bakraji, E.H.; Ahmad, M.; Salman, N.; Haloum, D.; Boutros, N.; Abboud, R.
2011-01-01
Thermoluminescence (TL) dating and Proton Induced X-ray Emission (PIXE) techniques have been utilized for the study of archaeological pottery fragment samples from Tell Saka Site, which is located at 25 km south east of Damascus city, Syria. Four samples were chosen randomly from the site, two from third level and two from fourth level for dating using TL technique and the results were in good agreement with the date assigned by archaeologists. Twenty-eight sherds were analyzed using PIXE technique in order to identify and characterize the elemental composition of pottery excavated from third and fourth levels, using 3 MV tandem accelerator in Damascus. The analysis provided almost 20 elements (Na, Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Co, Ni, Cu, Zn, Rb, Sr, Y, Zr, Nb). However, only 14 elements as follows: K, Ca, Ti, Mn, Fe, Co, Ni, Cu, Zn, Rb, Sr, Y, Zr, Nb were chosen for statistical analysis and have been processed using two multivariate statistical methods, Cluster and Factor analysis. The studied pottery were classify into two well defined groups. (author)
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Zhang Changbo; Wu Longhua; Luo Yongming; Zhang Haibo; Christie, Peter
2008-01-01
Principal components analysis (PCA) and correlation analysis were used to estimate the contribution of four components related to pollutant sources on the total variation in concentrations of Cu, Zn, Pb, Cd, As, Se, Hg, Fe and Mn in surface soil samples from a valley in east China with numerous copper and zinc smelters. Results indicate that when carrying out source identification of inorganic pollutants their tendency to migrate in soils may result in differences between the pollutant composition of the source and the receptor soil, potentially leading to errors in the characterization of pollutants using multivariate statistics. The stability and potential migration or movement of pollutants in soils must therefore be taken into account. Soil physicochemical properties may offer additional useful information. Two different mechanisms have been hypothesized for correlations between soil heavy metal concentrations and soil organic matter content and these may be helpful in interpreting the statistical analysis. - Principal components analysis with Varimax rotation can help identify sources of soil inorganic pollutants but pollutant migration and soil properties can exert important effects
Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi
2015-03-15
Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate
Callén, M S; López, J M; Mastral, A M
2010-08-15
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449). Copyright 2010 Elsevier B.V. All rights reserved.
Callen, M.S.; Lopez, J.M.; Mastral, A.M.
2010-01-01
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R 2 = 0.817, PRESS/SSY = 0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q CV 2 =0.813, PRESS/SSY = 0.187) and with the maximal external prediction for the 2001-2002 campaign (Q ext 2 =0.679 and PRESS/SSY = 0.321) versus the 2001-2004 campaign (Q ext 2 =0.551, PRESS/SSY = 0.449).
Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models.
Wilson, Machelle D; Sethi, Sunjay; Lein, Pamela J; Keil, Kimberly P
2017-03-01
The Sholl technique is widely used to quantify dendritic morphology. Data from such studies, which typically sample multiple neurons per animal, are often analyzed using simple linear models. However, simple linear models fail to account for intra-class correlation that occurs with clustered data, which can lead to faulty inferences. Mixed effects models account for intra-class correlation that occurs with clustered data; thus, these models more accurately estimate the standard deviation of the parameter estimate, which produces more accurate p-values. While mixed models are not new, their use in neuroscience has lagged behind their use in other disciplines. A review of the published literature illustrates common mistakes in analyses of Sholl data. Analysis of Sholl data collected from Golgi-stained pyramidal neurons in the hippocampus of male and female mice using both simple linear and mixed effects models demonstrates that the p-values and standard deviations obtained using the simple linear models are biased downwards and lead to erroneous rejection of the null hypothesis in some analyses. The mixed effects approach more accurately models the true variability in the data set, which leads to correct inference. Mixed effects models avoid faulty inference in Sholl analysis of data sampled from multiple neurons per animal by accounting for intra-class correlation. Given the widespread practice in neuroscience of obtaining multiple measurements per subject, there is a critical need to apply mixed effects models more widely. Copyright © 2017 Elsevier B.V. All rights reserved.
Matiatos, Ioannis, E-mail: i.matiatos@iaea.org
2016-01-15
Nitrate (NO{sub 3}) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ{sup 15}N–NO{sub 3} and δ{sup 18}O–NO{sub 3}) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO{sub 3} sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC). - Highlights: • More enriched N-isotope values were observed in the industrial/urban areas. • A Bayesian isotope mixing model was applied in a multiple land-use area. • A 3-component model explained the factors controlling nitrate content in groundwater. • Industrial
Micaletti, R. C.; Cakmak, A. S.; Nielsen, Søren R. K.
This paper contains an analyses of the error induced by applying the method of the equivalent statistical linearzation (ESL) to randomly-exited multi-degree-of-freedom (MDOF) geometrically nonlinear shear-frame structures as the number of degrees of freedom increases. The quantity that is analyzed...
Everson, Howard T.; And Others
This paper explores the feasibility of neural computing methods such as artificial neural networks (ANNs) and abductory induction mechanisms (AIM) for use in educational measurement. ANNs and AIMS methods are contrasted with more traditional statistical techniques, such as multiple regression and discriminant function analyses, for making…
Roy, Kevin; Undey, Cenk; Mistretta, Thomas; Naugle, Gregory; Sodhi, Manbir
2014-01-01
Multivariate statistical process monitoring (MSPM) is becoming increasingly utilized to further enhance process monitoring in the biopharmaceutical industry. MSPM can play a critical role when there are many measurements and these measurements are highly correlated, as is typical for many biopharmaceutical operations. Specifically, for processes such as cleaning-in-place (CIP) and steaming-in-place (SIP, also known as sterilization-in-place), control systems typically oversee the execution of the cycles, and verification of the outcome is based on offline assays. These offline assays add to delays and corrective actions may require additional setup times. Moreover, this conventional approach does not take interactive effects of process variables into account and cycle optimization opportunities as well as salient trends in the process may be missed. Therefore, more proactive and holistic online continued verification approaches are desirable. This article demonstrates the application of real-time MSPM to processes such as CIP and SIP with industrial examples. The proposed approach has significant potential for facilitating enhanced continuous verification, improved process understanding, abnormal situation detection, and predictive monitoring, as applied to CIP and SIP operations. © 2014 American Institute of Chemical Engineers.
Bakraji, E.H.
2012-01-01
X-ray fluorescence method and the technique of thermoluminescence (TL) dating have been utilized for the study of archaeological pottery fragment samples, fairly representative of Romanian period between 1 st century B.C. and 4th century A.D, from Judaidet Yabous site, which is located north-west of Damascus city, Syria. Four samples were chosen randomly among the forty six samples for dating using thermoluminescence technique and the results were in good agreement with the date assigned by archaeologists. The samples were irradiated for 1000 s live time twice, first using a Mo X-ray Tube and second using a 109 Cd radioactive source. Fifteen elements (K, Ca, Ti, Mn, Fe, Ni, Cu, Zn, Ga, Rb, Sr, Y, Zr, Nb, and Pb) were determined. The elemental concentrations have been processed using two multivariate statistical methods. The purpose of the study was to characterize by means of elements contents the pottery paste from Judaidet Yabous archaeological site and providing new data to the Syrian databases for future studies. From an archaeological point of view the results indicated that most of the potteries, were locally produced. (author)
Bakraji, E.H., E-mail: cscientificl@aec.org.sy [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Rihawy, M.S. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Castel, C. [CNRS – Maison de l’Orient et de la Méditerranée, Laboratoire “Archéorient”, CNRS/Université Lumière-Lyon 2 (France); Abboud, R. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic)
2015-03-15
Highlights: •PIXE and OSL methods were used to classify and date pottery from Tell Al-Rawda site. •Three groups were classified using PIXE, which suggest different sources of the clay. •OSL was used for dating the site and the date found was consistent with typology. -- Abstract: Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year.
Lei, Tianli; Chen, Shifeng; Wang, Kai; Zhang, Dandan; Dong, Lin; Lv, Chongning; Wang, Jing; Lu, Jincai
2018-02-01
Bupleuri Radix is a commonly used herb in clinic, and raw and vinegar-baked Bupleuri Radix are both documented in the Pharmacopoeia of People's Republic of China. According to the theories of traditional Chinese medicine, Bupleuri Radix possesses different therapeutic effects before and after processing. However, the chemical mechanism of this processing is still unknown. In this study, ultra-high-performance liquid chromatography with quadruple time-of-flight mass spectrometry coupled with multivariate statistical analysis including principal component analysis and orthogonal partial least square-discriminant analysis was developed to holistically compare the difference between raw and vinegar-baked Bupleuri Radix for the first time. As a result, 50 peaks in raw and processed Bupleuri Radix were detected, respectively, and a total of 49 peak chemical compounds were identified. Saikosaponin a, saikosaponin d, saikosaponin b 3 , saikosaponin e, saikosaponin c, saikosaponin b 2 , saikosaponin b 1 , 4''-O-acetyl-saikosaponin d, hyperoside and 3',4'-dimethoxy quercetin were explored as potential markers of raw and vinegar-baked Bupleuri Radix. This study has been successfully applied for global analysis of raw and vinegar-processed samples. Furthermore, the underlying hepatoprotective mechanism of Bupleuri Radix was predicted, which was related to the changes of chemical profiling. Copyright © 2017 John Wiley & Sons, Ltd.
Armin Saed-Moucheshi
2013-01-01
Full Text Available Multivariate statistical techniques were used to compare the relationship between yield and its related traits under noninoculated and inoculated cultivars with mycorrhizal fungus (Glomus intraradices; each one consisted of three wheat cultivars and four water regimes. Results showed that, under inoculation conditions, spike weight per plant and total chlorophyll content of the flag leaf were the most important variables contributing to wheat grain yield variation, while, under noninoculated condition, in addition to two mentioned traits, grain weight per spike and leaf area were also important variables accounting for wheat grain yield variation. Therefore, spike weight per plant and chlorophyll content of flag leaf can be used as selection criteria in breeding programs for both inoculated and noninoculated wheat cultivars under different water regimes, and also grain weight per spike and leaf area can be considered for noninoculated condition. Furthermore, inoculation of wheat cultivars showed higher value in the most measured traits, and the results indicated that inoculation treatment could change the relationship among morphological traits of wheat cultivars under drought stress. Also, it seems that the results of stepwise regression as a selecting method together with principal component and factor analysis are stronger methods to be applied in breeding programs for screening important traits.
Liu, Zechang; Wang, Liping; Liu, Yumei
2018-01-18
Hops impart flavor to beer, with the volatile components characterizing the various hop varieties and qualities. Fingerprinting, especially flavor fingerprinting, is often used to identify 'flavor products' because inconsistencies in the description of flavor may lead to an incorrect definition of beer quality. Compared to flavor fingerprinting, volatile fingerprinting is simpler and easier. We performed volatile fingerprinting using head space-solid phase micro-extraction gas chromatography-mass spectrometry combined with similarity analysis and principal component analysis (PCA) for evaluating and distinguishing between three major Chinese hops. Eighty-four volatiles were identified, which were classified into seven categories. Volatile fingerprinting based on similarity analysis did not yield any obvious result. By contrast, hop varieties and qualities were identified using volatile fingerprinting based on PCA. The potential variables explained the variance in the three hop varieties. In addition, the dendrogram and principal component score plot described the differences and classifications of hops. Volatile fingerprinting plus multivariate statistical analysis can rapidly differentiate between the different varieties and qualities of the three major Chinese hops. Furthermore, this method can be used as a reference in other fields. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Wang, Zaosheng; Li, Rui; Wu, Fengchang; Feng, Chenglian; Ye, Chun; Yan, Changzhou
2017-01-15
The occurrence and distribution of target estrogenic compounds in a highly urbanized industry-impacted coastal bay were investigated, and contamination profiles were evaluated by estimating total estradiol equivalents (∑EEQs) and risk quotients (RQs). Phenolic compounds were the most abundant xenoestrogens, but seldom showed contribution to the ∑EEQs. The diethylstilbestrol (DES) and 17α-ethinylestradiol (EE2) were the major contributors followed by 17β-estradiol (E2) in comparison with a slight contribution from estrone (E1) and estriol (E3). Both ∑EEQs and RQs indicated likely adverse effects posed on resident organisms. Further, multivariate statistical method comprehensively revealed pollution status by visualized factor scores and identified multiple "hotspots" of estrogenic sources, demonstrating the presence of complex pollution risk gradients inside and particularly outside of bay area. Overall, this study favors the integrative utilization of pollution indices and factor analysis as powerful tool to scientifically diagnose the pollution characterization of human-derived chemicals for better management decisions in aquatic environments. Copyright © 2016 Elsevier Ltd. All rights reserved.
Vetrimurugan Elumalai
2017-04-01
Full Text Available Heavy metals in surface and groundwater were analysed and their sources were identified using multivariate statistical tools for two towns in South Africa. Human exposure risk through the drinking water pathway was also assessed. Electrical conductivity values showed that groundwater is desirable to permissible for drinking except for six locations. Concentration of aluminium, lead and nickel were above the permissible limit for drinking at all locations. Boron, cadmium, iron and manganese exceeded the limit at few locations. Heavy metal pollution index based on ten heavy metals indicated that 85% of the area had good quality water, but 15% was unsuitable. Human exposure dose through the drinking water pathway indicated no risk due to boron, nickel and zinc, moderate risk due to cadmium and lithium and high risk due to silver, copper, manganese and lead. Hazard quotients were high in all sampling locations for humans of all age groups, indicating that groundwater is unsuitable for drinking purposes. Highly polluted areas were located near the coast, close to industrial operations and at a landfill site representing human-induced pollution. Factor analysis identified the four major pollution sources as: (1 industries; (2 mining and related activities; (3 mixed sources- geogenic and anthropogenic and (4 fertilizer application.
Marković, Snežana; Kerč, Janez; Horvat, Matej
2017-03-01
We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Ayivor, J.E.; Nyarko, B.J.B.; Dampare, S.B.; Okine, L.K.
2010-01-01
k 0 instrumental neutron activation analysis and atomic absorption spectrometry were applied to determine multi elements in thirteen Ghanaian herbal medicines used for the management of various diseases. Concentrations of AI, Cu, Mg, Mn and Na were determined. As, Br, K, CI, and Na were determined by short and medium irradiations at a thermal neutron flux of 5x10ncm -2 s -1 . Fe, Cr, Pb, Co, Ni, Sn, Ca, Ba, Li and Sb were determined using atomic absorption spectrometry. Ba, Cu, Li and V were present at trace levels whereas AI, CI, Na, Ca were present at major levels. K, Br, Mg, Mn, Co, Ni, Fe and Sb were also present at minor levels. The precision and accuracy of the method using real samples and standard reference materials were within ±10% of the reported value. Multivariate analytical techniques, such as cluster analysis and principal component analysis (PCA)/factor analysis (FA), have been applied to evaluate the chemical variations in the herbal medicine dataset. All the 13 samples may be grouped into two statistically significant clusters, reflecting the different chemical compositions. The concentrations of elements were within the recommended daily allowances or maximum permissible levels posing no adverse effects on human health.
Gogna, Navdeep; Hamid, Neda; Dorai, Kavita
2015-11-10
Extracts from the Carica papaya L. plant are widely reported to contain metabolites with antibacterial, antioxidant and anticancer activity. This study aims to analyze the metabolic profiles of papaya leaves and seeds in order to gain insights into their phytomedicinal constituents. We performed metabolite fingerprinting using 1D and 2D 1H NMR experiments and used multivariate statistical analysis to identify those plant parts that contain the most concentrations of metabolites of phytomedicinal value. Secondary metabolites such as phenyl propanoids, including flavonoids, were found in greater concentrations in the leaves as compared to the seeds. UPLC-ESI-MS verified the presence of significant metabolites in the papaya extracts suggested by the NMR analysis. Interestingly, the concentration of eleven secondary metabolites namely caffeic, cinnamic, chlorogenic, quinic, coumaric, vanillic, and protocatechuic acids, naringenin, hesperidin, rutin, and kaempferol, were higher in young as compared to old papaya leaves. The results of the NMR analysis were corroborated by estimating the total phenolic and flavonoid content of the extracts. Estimation of antioxidant activity in leaves and seed extracts by DPPH and ABTS in-vitro assays and antioxidant capacity in C2C12 cell line also showed that papaya extracts exhibit high antioxidant activity. Copyright © 2015 Elsevier B.V. All rights reserved.
Bakraji, E.H.; Rihawy, M.S.; Castel, C.; Abboud, R.
2015-01-01
Highlights: •PIXE and OSL methods were used to classify and date pottery from Tell Al-Rawda site. •Three groups were classified using PIXE, which suggest different sources of the clay. •OSL was used for dating the site and the date found was consistent with typology. -- Abstract: Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y
2008-02-18
The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.
Learning Statistics - in a WEB-based and non-linear way
Rootzen, Helle
2007-01-01
different from one another. They have different prior knowledge and different learning styles so it is a challenging task to teach them all in the same way. Furthermore the world of statistics has become so huge that it is impossible to cover everything. The structure imposed by the Bologna agreement gives...... can design the course – or a part of the course – so that it fits their individual learning style and their prior knowledge. Some prefer to look at examples first and afterwards look at which theories it is based on. Others want to do it the opposite way. Some wants to work with the problem themselves...
Lecarpentier, Yves; Claes, Victor; Hébert, Jean-Louis; Krokidis, Xénophon; Blanc, François-Xavier; Michel, Francine; Timbely, Oumar
2015-01-01
All near-equilibrium systems under linear regime evolve to stationary states in which there is constant entropy production rate. In an open chemical system that exchanges matter and energy with the exterior, we can identify both the energy and entropy flows associated with the exchange of matter and energy. This can be achieved by applying statistical mechanics (SM), which links the microscopic properties of a system to its bulk properties. In the case of contractile tissues such as human placenta, Huxley's equations offer a phenomenological formalism for applying SM. SM was investigated in human placental stem villi (PSV) (n = 40). PSV were stimulated by means of KCl exposure (n = 20) and tetanic electrical stimulation (n = 20). This made it possible to determine statistical entropy (S), internal energy (E), affinity (A), thermodynamic force (A / T) (T: temperature), thermodynamic flow (v) and entropy production rate (A / T x v). We found that PSV operated near equilibrium, i.e., A ≺≺ 2500 J/mol and in a stationary linear regime, i.e., (A / T) varied linearly with v. As v was dramatically low, entropy production rate which quantified irreversibility of chemical processes appeared to be the lowest ever observed in any contractile system.
Ju, Sang Gyu; Huh, Seung Jae; Han, Young Yih
2005-01-01
To improve the management of a medical linear accelerator, the records of operational failures of a Varian CL2100C over a ten year period were retrospectively analyzed. The failures were classified according to the involved functional subunits, with each class rated into one of three levels depending on the operational conditions. The relationships between the failure rate and working ratio and between the failure rate and outside temperature were investigated. In addition, the average life time of the main part and the operating efficiency over the last 4 years were analyzed. Among the recorded failures (total 587 failures), the most frequent failure was observed in the parts related with the collimation system, including the monitor chamber, which accounted for 20% of all failures. With regard to the operational conditions, 2nd level of failures, which temporally interrupted treatments, were the most frequent. Third level of failures, which interrupted treatment for more than several hours, were mostly caused by the accelerating subunit. The number of failures was increased with number of treatments and operating time. The average life-times of the Klystron and Thyratron became shorter as the working ratio increased, and were 42 and 83% of the expected values, respectively. The operating efficiency was maintained at 95% or higher, but this value slightly decreased. There was no significant correlation between the number of failures and the outside temperature. The maintenance of detailed equipment problems and failures records over a long period of time can provide good knowledge of equipment function as well as the capability of predicting future failure. More rigorous equipment maintenance is required for old medical linear accelerators for the advanced avoidance of serious failure and to improve the quality of patient treatment
Uhlemann, C.; Feix, M.; Codis, S.; Pichon, C.; Bernardeau, F.; L'Huillier, B.; Kim, J.; Hong, S. E.; Laigle, C.; Park, C.; Shin, J.; Pogosyan, D.
2018-02-01
Starting from a very accurate model for density-in-cells statistics of dark matter based on large deviation theory, a bias model for the tracer density in spheres is formulated. It adopts a mean bias relation based on a quadratic bias model to relate the log-densities of dark matter to those of mass-weighted dark haloes in real and redshift space. The validity of the parametrized bias model is established using a parametrization-independent extraction of the bias function. This average bias model is then combined with the dark matter PDF, neglecting any scatter around it: it nevertheless yields an excellent model for densities-in-cells statistics of mass tracers that is parametrized in terms of the underlying dark matter variance and three bias parameters. The procedure is validated on measurements of both the one- and two-point statistics of subhalo densities in the state-of-the-art Horizon Run 4 simulation showing excellent agreement for measured dark matter variance and bias parameters. Finally, it is demonstrated that this formalism allows for a joint estimation of the non-linear dark matter variance and the bias parameters using solely the statistics of subhaloes. Having verified that galaxy counts in hydrodynamical simulations sampled on a scale of 10 Mpc h-1 closely resemble those of subhaloes, this work provides important steps towards making theoretical predictions for density-in-cells statistics applicable to upcoming galaxy surveys like Euclid or WFIRST.
Siddiqi, Ariba; Arjunan, Sridhar P; Kumar, Dinesh K
2016-08-01
Age-associated changes in the surface electromyogram (sEMG) of Tibialis Anterior (TA) muscle can be attributable to neuromuscular alterations that precede strength loss. We have used our sEMG model of the Tibialis Anterior to interpret the age-related changes and compared with the experimental sEMG. Eighteen young (20-30 years) and 18 older (60-85 years) performed isometric dorsiflexion at 6 different percentage levels of maximum voluntary contractions (MVC), and their sEMG from the TA muscle was recorded. Six different age-related changes in the neuromuscular system were simulated using the sEMG model at the same MVCs as the experiment. The maximal power of the spectrum, Gaussianity and Linearity Test Statistics were computed from the simulated and experimental sEMG. A correlation analysis at α=0.05 was performed between the simulated and experimental age-related change in the sEMG features. The results show the loss in motor units was distinguished by the Gaussianity and Linearity test statistics; while the maximal power of the PSD distinguished between the muscular factors. The simulated condition of 40% loss of motor units with halved the number of fast fibers best correlated with the age-related change observed in the experimental sEMG higher order statistical features. The simulated aging condition found by this study corresponds with the moderate motor unit remodelling and negligible strength loss reported in literature for the cohorts aged 60-70 years.
Hansen, Michael Adsetts Edberg
Interest in statistical methodology is increasing so rapidly in the astronomical community that accessible introductory material in this area is long overdue. This book fills the gap by providing a presentation of the most useful techniques in multivariate statistics. A wide-ranging annotated set...
Gohil, B. S.; Hariharan, T. A.; Sharma, A. K.; Pandey, P. C.
1982-01-01
The 19.35 GHz and 22.235 GHz passive microwave radiometers (SAMIR) on board the Indian satellite Bhaskara have provided very useful data. From these data has been demonstrated the feasibility of deriving atmospheric and ocean surface parameters such as water vapor content, liquid water content, rainfall rate and ocean surface winds. Different approaches have been tried for deriving the atmospheric water content. The statistical and empirical methods have been used by others for the analysis of the Nimbus data. A simulation technique has been attempted for the first time for 19.35 GHz and 22.235 GHz radiometer data. The results obtained from three different methods are compared with radiosonde data. A case study of a tropical depression has been undertaken to demonstrate the capability of Bhaskara SAMIR data to show the variation of total water vapor and liquid water contents.
Huppert, Theodore J
2016-01-01
Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Liu, Pu; Hoth, Nils; Drebenstedt, Carsten; Sun, Yajun; Xu, Zhimin
2017-12-01
Groundwater is an important drinking water resource that requires protection in North China. Coal mining industry in the area may influence the water quality evolution. To provide primary characterization of the hydrogeochemical processes and paths that control the water quality evolution, a complex multi-layer groundwater system in a coal mining area is investigated. Multivariate statistical methods involving hierarchical cluster analysis (HCA) and principal component analysis (PCA) are applied, 6 zones and 3 new principal components are classified as major reaction zones and reaction factors. By integrating HCA and PCA with hydrogeochemical correlations analysis, potential phases, reactions and connections between various zones are presented. Carbonates minerals, gypsum, clay minerals as well as atmosphere gases - CO 2 , H 2 O and NH 3 are recognized as major reactants. Mixtures, evaporation, dissolution/precipitation of minerals and cation exchange are potential reactions. Inverse modeling is finally used, and it verifies the detailed processes and diverse paths. Consequently, 4 major paths are found controlling the variations of groundwater chemical properties. Shallow and deep groundwater is connected primarily by the flow of deep groundwater up through fractures and faults into the shallow aquifers. Mining does not impact the underlying aquifers that represent the most critical groundwater resource. But controls should be taken to block the mixing processes from highly polluted mine water. The paper highlights the complex hydrogeochemical evolution of a multi-layer groundwater system under mining impact, which could be applied to further groundwater quality management in the study area, as well as most of the other coalfields in North China. Copyright © 2017 Elsevier B.V. All rights reserved.
Shujuan Xue
2018-02-01
Full Text Available Radix Rehmanniae (RR is a kind of herb which is widely used in the clinical and food processing industry. There are four forms of RR used in traditional Chinese medicine practice, which include fresh RR (FRR, raw RR (RRR, processed RR (PRR, and another processed RR (APRR, in which the APRR was processed by nine cycles of repeated steaming and drying. There are a large number of saccharides in RR. However, the differences in content were shown by different processing methods. In this study, an effective method using high-performance liquid chromatography (HPLC and high-performance liquid chromatography-mass spectrometry (LC-MS coupled with multivariate statistical analysis to rapidly distinguish different RR samples and validate the proposed chemical conversion mechanism. The datasets of the content of saccharides were subjected to principal component analysis (PCA and one-way analysis of variance. The results showed that there different changes occurred in the contents of saccharides corresponding to the different processing methods, in which the contents of monosaccharides—namely arabinose, glucose, mannose, and galactose—had an increasing trend or remained relatively stable. However, the contents of fructose and oligosaccharides, including manninotriose, melibiose, sucrose, and raffinose, first increased and then reduced, or gradually decreased, yet the content of stachyose gradually decreased. The MSn data indicated that manninotriose, melibiose, and some monosaccharides were produced by the hydrolysis of oligosaccharides. In addition, the fragmentation pathways of 1-phenyl-3-methyl-5-pyrazolone (PMP derivatization of monosaccharides were also found that its glycosidic bond was first broken and subsequently its inside ring broke, and the characteristic fragment ions were produced at m/z 511.22, 493.20, 373.16, and 175.08 in the PMP derivatization of monosaccharides. In conclusion, this study illustrates the change and chemical conversion
Campanya, J. L.; Ogaya, X.; Jones, A. G.; Rath, V.; McConnell, B.; Haughton, P.; Prada, M.
2016-12-01
The Science Foundation Ireland funded project IRECCSEM project (www.ireccsem.ie) aims to evaluate Ireland's potential for onshore carbon sequestration in saline aquifers by integrating new electromagnetic geophysical data with existing geophysical and geological data. One of the objectives of this component of IRECCSEM is to characterise the subsurface beneath the Loop Head Peninsula (part of Clare Basin, Co. Clare, Ireland), and identify major electrical resistivity structures that can guide an interpretation of the carbon sequestration potential of this area. During the summer of 2014, a magnetotelluric (MT) survey was carried out on the Loop Head Peninsula, and data from a total of 140 sites were acquired, including audio-magnetotelluric (AMT), and broadband magnetotelluric (BBMT). The dataset was used to generate shallow three-dimensional (3-D) electrical resistivity models constraining the subsurface to depths of up to 3.5 km. The three-dimensional (3-D) joint inversions were performed using three different types of electromagnetic data: MT impedance tensor (Z), geomagnetic transfer functions (T), and inter-station horizontal magnetic transfer-functions (H). The interpretation of the results was complemented with second-derivative models of the resulting electrical resistivity models, and a quantitative comparison with borehole data using multivariate statistical methods. Second-derivative models were used to define the main interfaces between the geoelectrical structures, facilitating superior comparison with geological and seismic results, and also reducing the influence of the colour scale when interpreting the results. Specific analysis was performed to compare the extant borehole data with the electrical resistivity model, identifying those structures that are better characterised by the resistivity model. Finally, the electrical resistivity model was also used to propagate some of the physical properties measured in the borehole, when a good relation was
Sjögren, M; Li, H; Banner, C; Rafter, J; Westerholm, R; Rannug, U
1996-01-01
The emission of diesel exhaust particulates is associated with potentially severe biological effects, e.g., cancer. The aim of the present study was to apply multivariate statistical methods to identify factors that affect the biological potency of these exhausts. Ten diesel fuels were analyzed regarding physical and chemical characteristics. Particulate exhaust emissions were sampled after combustion of these fuels on two makes of heavy duty diesel engines. Particle extracts were chemically analyzed and tested for mutagenicity in the Ames test. Also, the potency of the extracts to competitively inhibit the binding of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) to the Ah receptor was assessed. Relationships between fuel characteristics and biological effects of the extracts were studied, using partial least squares regression (PLS). The most influential chemical fuel parameters included the contents of sulfur, certain polycyclic aromatic compounds (PAC), and naphthenes. Density and flash point were positively correlated with genotoxic potency. Cetane number and upper distillation curve points were negatively correlated with both mutagenicity and Ah receptor affinity. Between 61% and 70% of the biological response data could be explained by the measured chemical and physical factors of the fuels. By PLS modeling of extract data versus the biological response data, 66% of the genotoxicity could be explained, by 41% of the chemical variation. The most important variables, associated with both mutagenicity and Ah receptor affinity, included 1-nitropyrene, particle bound nitrate, indeno[1,2,3-cd]pyrene, and emitted mass of particles. S9-requiring mutagenicity was highly correlated with certain PAC, whereas S9-independent mutagenicity was better correlated with nitrates and 1-nitropyrene. The emission of sulfates also showed a correlation both with the emission of particles and with the biological effects. The results indicate that fuels with biologically less hazardous
Kamtchueng, Brice T; Fantong, Wilson Y; Wirmvem, Mengnjo J; Tiodjio, Rosine E; Takounjou, Alain F; Ndam Ngoupayou, Jules R; Kusakabe, Minoru; Zhang, Jing; Ohba, Takeshi; Tanyileke, Gregory; Hell, Joseph V; Ueda, Akira
2016-09-01
With the use of conventional hydrogeochemical techniques, multivariate statistical analysis, and stable isotope approaches, this paper investigates for the first time surface water and groundwater from the surrounding areas of Lake Monoun (LM), West Cameroon. The results reveal that waters are generally slightly acidic to neutral. The relative abundance of major dissolved species are Ca(2+) > Mg(2+) > Na(+) > K(+) for cations and HCO3 (-) ≫ NO3 (-) > Cl(-) > SO4 (2-) for anions. The main water type is Ca-Mg-HCO3. Observed salinity is related to water-rock interaction, ion exchange process, and anthropogenic activities. Nitrate and chloride have been identified as the most common pollutants. These pollutants are attributed to the chlorination of wells and leaching from pit latrines and refuse dumps. The stable isotopic compositions in the investigated water sources suggest evidence of evaporation before recharge. Four major groups of waters were identified by salinity and NO3 concentrations using the Q-mode hierarchical cluster analysis (HCA). Consistent with the isotopic results, group 1 represents fresh unpolluted water occurring near the recharge zone in the general flow regime; groups 2 and 3 are mixed water whose composition is controlled by both weathering of rock-forming minerals and anthropogenic activities; group 4 represents water under high vulnerability of anthropogenic pollution. Moreover, the isotopic results and the HCA showed that the CO2-rich bottom water of LM belongs to an isolated hydrological system within the Foumbot plain. Except for some springs, groundwater water in the area is inappropriate for drinking and domestic purposes but good to excellent for irrigation.
Zafar, I.; Edirisinghe, E. A.; Acar, S.; Bez, H. E.
2007-02-01
Automatic vehicle Make and Model Recognition (MMR) systems provide useful performance enhancements to vehicle recognitions systems that are solely based on Automatic License Plate Recognition (ALPR) systems. Several car MMR systems have been proposed in literature. However these approaches are based on feature detection algorithms that can perform sub-optimally under adverse lighting and/or occlusion conditions. In this paper we propose a real time, appearance based, car MMR approach using Two Dimensional Linear Discriminant Analysis that is capable of addressing this limitation. We provide experimental results to analyse the proposed algorithm's robustness under varying illumination and occlusions conditions. We have shown that the best performance with the proposed 2D-LDA based car MMR approach is obtained when the eigenvectors of lower significance are ignored. For the given database of 200 car images of 25 different make-model classifications, a best accuracy of 91% was obtained with the 2D-LDA approach. We use a direct Principle Component Analysis (PCA) based approach as a benchmark to compare and contrast the performance of the proposed 2D-LDA approach to car MMR. We conclude that in general the 2D-LDA based algorithm supersedes the performance of the PCA based approach.
Amin Hossein Morshedy
2017-07-01
Full Text Available Introduction Nowadays, exploration of rare earth element (REE resources is considered as one of the strategic priorities, which has a special position in the advanced and intelligent industries (Castor and Hedrick, 2006. Significant resources of REEs are found in a wide range of geological settings, including primary deposits associated with igneous and hydrothermal processes (e.g. carbonatite, (per alkaline-igneous rocks, iron-oxide breccia complexes, scarns, fluorapatite veins and pegmatites, and secondary deposits concentrated by sedimentary processes and weathering (e.g. heavy-mineral sand deposits, fluviatile sandstones, unconformity-related uranium deposits, and lignites (Jaireth et al., 2014. Recent studies on various parts of Iran led to the identification of promising potential of these elements, including Central Iran, alkaline rocks in the Eslami Peninsula, iron and apatite in the Hormuz Island, Kahnouj titanium deposit, granitoid bodies in Yazd, Azerbaijan, and Mashhad and associated dikes, and finally placers related to the Shemshak formation in Marvast, Kharanagh, and Ardekan indicate high concentration of REE in magmatogenic iron–apatite deposits in Central Iran and placers in Marvast area in Yazd (Ghorbani, 2013. Materials and methods In the present study, the geochemical behavior of rare earth elements is modeled by using multivariate statistical methods in the eastern part of the Marvast placer. Marvast is located 185 km south of the city of Yazd in central Iran between Yazd and Mehriz. This area lies within the southeastern part of the Sanandaj-Sirjan Zone (Alipour-Asll et al., 2012. The samples of 53 wells were analyzed for Whole-rock trace-element concentrations (including REE by inductively coupled plasma-mass spectrometry (ICP-MS (GSI, 2004. The clustering techniques such as multivariate statistical analysis technique can be employed to find appropriate groups in data sets. One of the main objectives of data clustering
der, R.
1987-01-01
The various approaches to nonequilibrium statistical mechanics may be subdivided into convolution and convolutionless (time-local) ones. While the former, put forward by Zwanzig, Mori, and others, are used most commonly, the latter are less well developed, but have proven very useful in recent applications. The aim of the present series of papers is to develop the time-local picture (TLP) of nonequilibrium statistical mechanics on a new footing and to consider its physical implications for topics such as the formulation of irreversible thermodynamics. The most natural approach to TLP is seen to derive from the Fourier-Laplace transformwidetilde{C}(z)) of pertinent time correlation functions, which on the physical sheet typically displays an essential singularity at z=∞ and a number of macroscopic and microscopic poles in the lower half-plane corresponding to long- and short-lived modes, respectively, the former giving rise to the autonomous macrodynamics, whereas the latter are interpreted as doorway modes mediating the transfer of information from relevant to irrelevant channels. Possible implications of this doorway mode concept for socalled extended irreversible thermodynamics are briefly discussed. The pole structure is used for deriving new kinds of generalized Green-Kubo relations expressing macroscopic quantities, transport coefficients, e.g., by contour integrals over current-current correlation functions obeying Hamiltonian dynamics, the contour integration replacing projection. The conventional Green-Kubo relations valid for conserved quantities only are rederived for illustration. Moreover,widetilde{C}(z) may be expressed by a Laurent series expansion in positive and negative powers of z, from which a rigorous, general, and straightforward method is developed for extracting all macroscopic quantities from so-called secularly divergent expansions ofwidetilde{C}(z) as obtained from the application of conventional many-body techniques to the calculation
Mayvan, Ali D.; Aghaeinia, Hassan; Kazemi, Mohammad
2017-12-01
This paper focuses on robust transceiver design for throughput enhancement on the interference channel (IC), under imperfect channel state information (CSI). In this paper, two algorithms are proposed to improve the throughput of the multi-input multi-output (MIMO) IC. Each transmitter and receiver has, respectively, M and N antennas and IC operates in a time division duplex mode. In the first proposed algorithm, each transceiver adjusts its filter to maximize the expected value of signal-to-interference-plus-noise ratio (SINR). On the other hand, the second algorithm tries to minimize the variances of the SINRs to hedge against the variability due to CSI error. Taylor expansion is exploited to approximate the effect of CSI imperfection on mean and variance. The proposed robust algorithms utilize the reciprocity of wireless networks to optimize the estimated statistical properties in two different working modes. Monte Carlo simulations are employed to investigate sum rate performance of the proposed algorithms and the advantage of incorporating variation minimization into the transceiver design.
2005-01-01
For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees
Geert Heidema, A.; Thissen, U.; Boer, J.M.A.; Bouwman, F.G.; Feskens, E.J.M.; Mariman, E.C.M.
2009-01-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch
Mohammed M. Kamal
2012-09-01
Full Text Available MODerate Resolution Imaging Spectroradiometer (MODIS aerosol retrievals over the North Atlantic spanning seven hurricane seasons are combined with the Statistical Hurricane Intensity Prediction Scheme (SHIPS parameters. The difference between the current and future intensity changes were selected as response variables. For 24 major hurricanes (category 3, 4 and 5 between 2003 and 2009, eight lead time response variables were determined to be between 6 and 48 h. By combining MODIS and SHIPS data, 56 variables were compiled and selected as predictors for this study. Variable reduction from 56 to 31 was performed in two steps; the first step was via correlation coefficients (cc followed by Principal Component Analysis (PCA extraction techniques. The PCA reduced 31 variables to 20. Five categories were established based on the PCA group variables exhibiting similar physical phenomena. Average aerosol retrievals from MODIS Level 2 data in the vicinity of UTC 1,200 and 1,800 h were mapped to the SHIPS parameters to perform Multiple Linear Regression (MLR between each response variable against six sets of predictors of 31, 30, 28, 27, 23 and 20 variables. The deviation among the predictors Root Mean Square Error (RMSE varied between 0.01 through 0.05 and, therefore, implied that reducing the number of variables did not change the core physical information. Even when the parameters are reduced from 56 to 20, the correlation values exhibit a stronger relationship between the response and predictors. Therefore, the same phenomena can be explained by the reduction of variables.
2001-01-01
For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
2000-01-01
For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
1999-01-01
For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
2003-01-01
For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products
2004-01-01
For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees
2000-01-01
For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
Multivariate analysis methods in physics
Wolter, M.
2007-01-01
A review of multivariate methods based on statistical training is given. Several multivariate methods useful in high-energy physics analysis are discussed. Selected examples from current research in particle physics are discussed, both from the on-line trigger selection and from the off-line analysis. Also statistical training methods are presented and some new application are suggested [ru
Köylüoglu, H. U.; Nielsen, Søren R. K.; Cakmak, A. S.
Geometrically non-linear multi-degree-of-freedom (MDOF) systems subject to random excitation are considered. New semi-analytical approximate forward difference equations for the lower order non-stationary statistical moments of the response are derived from the stochastic differential equations...... of motion, and, the accuracy of these equations is numerically investigated. For stationary excitations, the proposed method computes the stationary statistical moments of the response from the solution of non-linear algebraic equations....
Venkatramanan Senapathi
2013-07-01
Full Text Available Ground water samples collected at different locations in and around the Nagapattinam district were analyzed for their physicochemical characteristics. The ground water samples were collected from fifty two dug and deep wells during the monsoon and summer seasons in June and December, 2011. The present investigation is focused on the determination of physico-chemical parameters such as pH, EC, TDS, Ca, Mg, Na, K, HCO3, SO4 and Cl. Ground water suitability for drinking, domestic and agri- cultural purposes was examined by using WHO standards. Correlation, factor and cluster analyses were applied to classify the ground water qualities and to categorize the geochemical processes controlling ground water geochemistry. Factor analysis indicates that seawater intrusion and agriculture runoff are dominant factors controlling the hydrogeochemistry of ground water in the study area. Cluster analysis was helpful for the classification on the basis of contamination characteristics of ground water quality. This study also elucidates that multivariate statistical analyses can be used to improve the understanding of ground water status and assessment of ground water quality. Resumen En este estudio se analizan las características fisicoquímicas de muestras de aguas subterráneas tomadas en diferentes locaciones en y alrededor del distrito de Nagapattinam. Las muestras se recolectaron en 52 pozos cavados y perforaciones profundas durante las subestaciones del monzón y el verano, en los meses de junio y diciembre (2011. La presente investigación está enfocada en la determinación de parametros fisicoquímicos como pH, EC, TDS, Ca, Mg, Na, K, HCO3, SO4 y Cl. Se examinó la pertinencia de estas aguas para consumo y para irrigación a la luz de los estándares de la Organización Mundial de la Salud. Se aplicaron análisis de correlación, factores y cúmulos para clasificar las muestras y categorizar los procesos geoquímicos que controlan las aguas subterr
Karunathilaka, Sanjeewa R; Kia, Ali-Reza Fardin; Srigley, Cynthia; Chung, Jin Kyu; Mossoba, Magdi M
2016-10-01
A rapid tool for evaluating authenticity was developed and applied to the screening of extra virgin olive oil (EVOO) retail products by using Fourier-transform near infrared (FT-NIR) spectroscopy in combination with univariate and multivariate data analysis methods. Using disposable glass tubes, spectra for 62 reference EVOO, 10 edible oil adulterants, 20 blends consisting of EVOO spiked with adulterants, 88 retail EVOO products and other test samples were rapidly measured in the transmission mode without any sample preparation. The univariate conformity index (CI) and the multivariate supervised soft independent modeling of class analogy (SIMCA) classification tool were used to analyze the various olive oil products which were tested for authenticity against a library of reference EVOO. Better discrimination between the authentic EVOO and some commercial EVOO products was observed with SIMCA than with CI analysis. Approximately 61% of all EVOO commercial products were flagged by SIMCA analysis, suggesting that further analysis be performed to identify quality issues and/or potential adulterants. Due to its simplicity and speed, FT-NIR spectroscopy in combination with multivariate data analysis can be used as a complementary tool to conventional official methods of analysis to rapidly flag EVOO products that may not belong to the class of authentic EVOO. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Aslan, B.; Zech, G.
2005-01-01
We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications
Lloyd-Jones, Luke R; Robinson, Matthew R; Yang, Jian; Visscher, Peter M
2018-04-01
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure ( e.g. , a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects. Copyright © 2018 by the Genetics Society of America.
Schenone, Agustina V; Culzoni, María J; Marsili, Nilda R; Goicoechea, Héctor C
2013-06-01
The performance of MCR-ALS was studied in the modeling of non-linear kinetic-spectrophotometric data acquired by a stopped-flow system for the quantitation of tartrazine in the presence of brilliant blue and sunset yellow FCF as possible interferents. In the present work, MCR-ALS and U-PCA/RBL were firstly applied to remove the contribution of unexpected components not included in the calibration set. Secondly, a polynomial function was used to model the non-linear data obtained by the implementation of the algorithms. MCR-ALS was the only strategy that allowed the determination of tartrazine in test samples accurately. Therefore, it was applied for the analysis of tartrazine in beverage samples with minimum sample preparation and short analysis time. The proposed method was validated by comparison with a chromatographic procedure published in the literature. Mean recovery values between 98% and 100% and relative errors of prediction values between 4% and 9% were indicative of the good performance of the method. Copyright © 2012 Elsevier Ltd. All rights reserved.
Pan, Guangming; Wang, Shaochen; Zhou, Wang
2017-10-01
In this paper, we consider the asymptotic behavior of Xfn (n )≔∑i=1 nfn(xi ) , where xi,i =1 ,…,n form orthogonal polynomial ensembles and fn is a real-valued, bounded measurable function. Under the condition that Var Xfn (n )→∞ , the Berry-Esseen (BE) bound and Cramér type moderate deviation principle (MDP) for Xfn (n ) are obtained by using the method of cumulants. As two applications, we establish the BE bound and Cramér type MDP for linear spectrum statistics of Wigner matrix and sample covariance matrix in the complex cases. These results show that in the edge case (which means fn has a particular form f (x ) I (x ≥θn ) where θn is close to the right edge of equilibrium measure and f is a smooth function), Xfn (n ) behaves like the eigenvalues counting function of the corresponding Wigner matrix and sample covariance matrix, respectively.
Lasso and probabilistic inequalities for multivariate point processes
Hansen, Niels Richard; Reynaud-Bouret, Patricia; Rivoirard, Vincent
2015-01-01
Due to its low computational cost, Lasso is an attractive regularization method for high-dimensional statistical settings. In this paper, we consider multivariate counting processes depending on an unknown function parameter to be estimated by linear combinations of a fixed dictionary. To select...... for multivariate Hawkes processes are proven, which allows us to check these assumptions by considering general dictionaries based on histograms, Fourier or wavelet bases. Motivated by problems of neuronal activity inference, we finally carry out a simulation study for multivariate Hawkes processes and compare our...... methodology with the adaptive Lasso procedure proposed by Zou in (J. Amer. Statist. Assoc. 101 (2006) 1418–1429). We observe an excellent behavior of our procedure. We rely on theoretical aspects for the essential question of tuning our methodology. Unlike adaptive Lasso of (J. Amer. Statist. Assoc. 101 (2006...
Introduction to Bayesian statistics
Bolstad, William M
2017-01-01
There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
An Exact Confidence Region in Multivariate Calibration
Mathew, Thomas; Kasala, Subramanyam
1994-01-01
In the multivariate calibration problem using a multivariate linear model, an exact confidence region is constructed. It is shown that the region is always nonempty and is invariant under nonsingular transformations.
Multivariate analysis: models and method
Sanz Perucha, J.
1990-01-01
Data treatment techniques are increasingly used since computer methods result of wider access. Multivariate analysis consists of a group of statistic methods that are applied to study objects or samples characterized by multiple values. A final goal is decision making. The paper describes the models and methods of multivariate analysis
Multivariate pattern dependence.
Stefano Anzellotti
2017-11-01
Full Text Available When we perform a cognitive task, multiple brain regions are engaged. Understanding how these regions interact is a fundamental step to uncover the neural bases of behavior. Most research on the interactions between brain regions has focused on the univariate responses in the regions. However, fine grained patterns of response encode important information, as shown by multivariate pattern analysis. In the present article, we introduce and apply multivariate pattern dependence (MVPD: a technique to study the statistical dependence between brain regions in humans in terms of the multivariate relations between their patterns of responses. MVPD characterizes the responses in each brain region as trajectories in region-specific multidimensional spaces, and models the multivariate relationship between these trajectories. We applied MVPD to the posterior superior temporal sulcus (pSTS and to the fusiform face area (FFA, using a searchlight approach to reveal interactions between these seed regions and the rest of the brain. Across two different experiments, MVPD identified significant statistical dependence not detected by standard functional connectivity. Additionally, MVPD outperformed univariate connectivity in its ability to explain independent variance in the responses of individual voxels. In the end, MVPD uncovered different connectivity profiles associated with different representational subspaces of FFA: the first principal component of FFA shows differential connectivity with occipital and parietal regions implicated in the processing of low-level properties of faces, while the second and third components show differential connectivity with anterior temporal regions implicated in the processing of invariant representations of face identity.
Su, Shiliang; Zhi, Junjun; Lou, Liping; Huang, Fang; Chen, Xia; Wu, Jiaping
Characterizing the spatio-temporal patterns and apportioning the pollution sources of water bodies are important for the management and protection of water resources. The main objective of this study is to describe the dynamics of water quality and provide references for improving river pollution control practices. Comprehensive application of neural-based modeling and different multivariate methods was used to evaluate the spatio-temporal patterns and source apportionment of pollution in Qiantang River, China. Measurement data were obtained and pretreated for 13 variables from 41 monitoring sites for the period of 2001-2004. A self-organizing map classified the 41 monitoring sites into three groups (Group A, B and C), representing different pollution characteristics. Four significant parameters (dissolved oxygen, biochemical oxygen demand, total phosphorus and total lead) were identified by discriminant analysis for distinguishing variations of different years, with about 80% correct assignment for temporal variation. Rotated principal component analysis (PCA) identified four potential pollution sources for Group A (domestic sewage and agricultural pollution, industrial wastewater pollution, mineral weathering, vehicle exhaust and sand mining), five for Group B (heavy metal pollution, agricultural runoff, vehicle exhaust and sand mining, mineral weathering, chemical plants discharge) and another five for Group C (vehicle exhaust and sand mining, chemical plants discharge, soil weathering, biochemical pollution, mineral weathering). The identified potential pollution sources explained 75.6% of the total variances for Group A, 75.0% for Group B and 80.0% for Group C, respectively. Receptor-based source apportionment was applied to further estimate source contributions for each pollution variable in the three groups, which facilitated and supported the PCA results. These results could assist managers to develop optimal strategies and determine priorities for river
Risager, Morten S.; Rudnick, Zeev
We study a variant of a problem considered by Dinaburg and Sinai on the statistics of the minimal solution to a linear Diophantine equation. We show that the signed ratio between the Euclidean norms of the minimal solution and the coefficient vector is uniformly distributed modulo one. We reduce ...
Pakhnutov I.A.
2017-04-01
Full Text Available the paper deals with iterative interpolation methods in forms of similar recursive procedures defined by a sort of simple functions (interpolation basis not necessarily real valued. These basic functions are kind of arbitrary type being defined just by wish and considerations of user. The studied interpolant construction shows virtue of versatility: it may be used in a wide range of vector spaces endowed with scalar product, no dimension restrictions, both in Euclidean and Hilbert spaces. The choice of basic interpolation functions is as wide as possible since it is subdued nonessential restrictions. The interpolation method considered in particular coincides with traditional polynomial interpolation (mimic of Lagrange method in real unidimensional case or rational, exponential etc. in other cases. The interpolation as iterative process, in fact, is fairly flexible and allows one procedure to change the type of interpolation, depending on the node number in a given set. Linear interpolation basis options (perhaps some nonlinear ones allow to interpolate in noncommutative spaces, such as spaces of nondegenerate matrices, interpolated data can also be relevant elements of vector spaces over arbitrary numeric field. By way of illustration, the author gives the examples of interpolation on the real plane, in the separable Hilbert space and the space of square matrices with vektorvalued source data.
Multivariate calculus and geometry
Dineen, Seán
2014-01-01
Multivariate calculus can be understood best by combining geometric insight, intuitive arguments, detailed explanations and mathematical reasoning. This textbook has successfully followed this programme. It additionally provides a solid description of the basic concepts, via familiar examples, which are then tested in technically demanding situations. In this new edition the introductory chapter and two of the chapters on the geometry of surfaces have been revised. Some exercises have been replaced and others provided with expanded solutions. Familiarity with partial derivatives and a course in linear algebra are essential prerequisites for readers of this book. Multivariate Calculus and Geometry is aimed primarily at higher level undergraduates in the mathematical sciences. The inclusion of many practical examples involving problems of several variables will appeal to mathematics, science and engineering students.
Almazan T, M. G.; Jimenez R, M.; Monroy G, F.; Tenorio, D.; Rodriguez G, N. L.
2009-01-01
The elementary composition of archaeological ceramic fragments obtained during the explorations in San Miguel Ixtapan, Mexico State, was determined by the neutron activation analysis technique. The samples irradiation was realized in the research reactor TRIGA Mark III with a neutrons flow of 1·10 13 n·cm -2 ·s -1 . The irradiation time was of 2 hours. Previous to the acquisition of the gamma rays spectrum the samples were allowed to decay from 12 to 14 days. The analyzed elements were: Nd, Ce, Lu, Eu, Yb, Pa(Th), Tb, La, Cr, Hf, Sc, Co, Fe, Cs, Rb. The statistical treatment of the data, consistent in the group analysis and the main components analysis allowed to identify three different origins of the archaeological ceramic, designated as: local, foreign and regional. (Author)
Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu
2017-04-01
The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
Wang, Shengnan; Hua, Yujiao; Zou, Lisi; Liu, Xunhong; Yan, Ying; Zhao, Hui; Luo, Yiyuan; Liu, Juanxiu
2018-02-01
Scrophulariae Radix is one of the most popular traditional Chinese medicines (TCMs). Primary processing of Scrophulariae Radix is an important link which closely related to the quality of products in this TCM. The aim of this study is to explore the influence of different processing methods on chemical constituents in Scrophulariae Radix. The difference of chemical constituents in Scrophulariae Radix processed by different methods was analyzed by using ultra fast liquid chromatography-triple quadrupole-time of flight mass spectrometry coupled with principal component analysis and orthogonal partial least squares discriminant analysis. Furthermore, the contents of 12 index differential constituents in Scrophulariae Radix processed by different methods were simultaneously determined by using ultra fast liquid chromatography coupled with triple quadrupole-linear ion trap mass spectrometry. Gray relational analysis was performed to evaluate the different processed samples according to the contents of 12 constituents. All of the results demonstrated that the quality of Scrophulariae Radix processed by "sweating" method was better. This study will provide the basic information for revealing the change law of chemical constituents in Scrophulariae Radix processed by different methods and facilitating selection of the suitable processing method of this TCM. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Heidema, A Geert; Thissen, Uwe; Boer, Jolanda M A; Bouwman, Freek G; Feskens, Edith J M; Mariman, Edwin C M
2009-06-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch monitoring project for cardiovascular disease risk factors, men who died of CHD between initial participation (1987-1991) and end of follow-up (January 1, 2000) (N = 44) and matched controls (N = 44) were selected. Baseline plasma concentrations of proteins were measured by a multiplex immunoassay. With the use of PLS, we identified 15 proteins with prognostic value for CHD mortality and sets of proteins associated with the intermediate end points. Subsequently, sets of proteins and intermediate end points were analyzed together by Principal Components Analysis, indicating that proteins involved in inflammation explained most of the variance, followed by proteins involved in metabolism and proteins associated with total-C. This study is one of the first in which the association of a large number of plasma proteins with CHD mortality and intermediate end points is investigated by applying multivariate statistics, providing insight in the relationships among proteins, intermediate end points and CHD mortality, and a set of proteins with prognostic value.
Lasso and probabilistic inequalities for multivariate point processes
Hansen, Niels Richard; Reynaud-Bouret, Patricia; Rivoirard, Vincent
2012-01-01
Due to its low computational cost, Lasso is an attractive regularization method for high-dimensional statistical settings. In this paper, we consider multivariate counting processes depending on an unknown function parameter to be estimated by linear combinations of a fixed dictionary. To select coefficients, we propose an adaptive $\\ell_{1}$-penalization methodology, where data-driven weights of the penalty are derived from new Bernstein type inequalities for martingales. Oracle inequalities...
Garcia, M.H.; Rabaute, A.; Yven, B.; Guillemot, D.
2010-01-01
relevant information about the spatial continuity of rock properties as measured on cores in laboratory. To do so, multivariate statistical analysis methods, including principal component analysis based on linear or rank (Spearman) correlations, were carried out. They show that well-log compressive velocity ( V p) is well correlated to static Young modulus and compressive strength measured on cores, and that downhole bulk density and Total CMR porosity are well correlated to dynamic Young modulus, dynamic shear modulus and compressive velocity on cores. Studying the spatial continuity and trends of properties in argillaceous units was a primary objective of the study. To do so, the spatial analysis was first conducted on the well-log properties that proved to be well correlated to properties measured on cores, lab properties remaining the reference physical properties. Lateral and vertical spatial trends were observed and interpreted on the selected well-log properties. In order to confirm that these spatial trends were effective and could apply to physical properties measured on cores, the spatial continuity of some correlated lab properties was studied. Similar trends were found that validated the approach of using log properties for characterizing the spatial continuity of core physical properties. (authors)
American Society for Testing and Materials. Philadelphia
2010-01-01
1.1 This practice covers only S-N and ε-N relationships that may be reasonably approximated by a straight line (on appropriate coordinates) for a specific interval of stress or strain. It presents elementary procedures that presently reflect good practice in modeling and analysis. However, because the actual S-N or ε-N relationship is approximated by a straight line only within a specific interval of stress or strain, and because the actual fatigue life distribution is unknown, it is not recommended that (a) the S-N or ε-N curve be extrapolated outside the interval of testing, or (b) the fatigue life at a specific stress or strain amplitude be estimated below approximately the fifth percentile (P ≃ 0.05). As alternative fatigue models and statistical analyses are continually being developed, later revisions of this practice may subsequently present analyses that permit more complete interpretation of S-N and ε-N data.
Multivariate Regression Analysis and Slaughter Livestock,
AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY
Multivariate Matrix-Exponential Distributions
Bladt, Mogens; Nielsen, Bo Friis
2010-01-01
be written as linear combinations of the elements in the exponential of a matrix. For this reason we shall refer to multivariate distributions with rational Laplace transform as multivariate matrix-exponential distributions (MVME). The marginal distributions of an MVME are univariate matrix......-exponential distributions. We prove a characterization that states that a distribution is an MVME distribution if and only if all non-negative, non-null linear combinations of the coordinates have a univariate matrix-exponential distribution. This theorem is analog to a well-known characterization theorem...
Introduction to multivariate discrimination
Kégl, Balázs
2013-07-01
Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either
Introduction to multivariate discrimination
Kegl, B.
2013-01-01
Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyper-parameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either
Multivariate analysis of eigenvalues and eigenvectors in tensor based morphometry
Rajagopalan, Vidya; Schwartzman, Armin; Hua, Xue; Leow, Alex; Thompson, Paul; Lepore, Natasha
2015-01-01
We develop a new algorithm to compute voxel-wise shape differences in tensor-based morphometry (TBM). As in standard TBM, we non-linearly register brain T1-weighed MRI data from a patient and control group to a template, and compute the Jacobian of the deformation fields. In standard TBM, the determinants of the Jacobian matrix at each voxel are statistically compared between the two groups. More recently, a multivariate extension of the statistical analysis involving the deformation tensors derived from the Jacobian matrices has been shown to improve statistical detection power.7 However, multivariate methods comprising large numbers of variables are computationally intensive and may be subject to noise. In addition, the anatomical interpretation of results is sometimes difficult. Here instead, we analyze the eigenvalues and the eigenvectors of the Jacobian matrices. Our method is validated on brain MRI data from Alzheimer's patients and healthy elderly controls from the Alzheimer's Disease Neuro Imaging Database.
Glick, D.C.; Davis, A.
1984-07-01
The multivariate statistical techniques of correlation coefficients, factor analysis, and cluster analysis, implemented by computer programs, can be used to process a large data set and produce a summary of relationships between variables and between samples. These techniques were used to find relationships for data on the inorganic constituents of US coals. Three hundred thirty-five whole-seam channel samples from six US coal provinces were analyzed for inorganic variables. After consideration of the attributes of data expressed on ash basis and whole-coal basis, it was decided to perform complete statistical analyses on both data sets. Thirty variables expressed on whole-coal basis and twenty-six variables expressed on ash basis were used. For each inorganic variable, a frequency distribution histogram and a set of summary statistics was produced. These were subdivided to reveal the manner in which concentrations of inorganic constituents vary between coal provinces and between coal regions. Data collected on 124 samples from three stratigraphic groups (Pottsville, Monongahela, Allegheny) in the Appalachian region were studied using analysis of variance to determine degree of variability between stratigraphic levels. Most variables showed differences in mean values between the three groups. 193 references, 71 figures, 54 tables.
溫福星 Fur-Hsing Wen
2012-03-01
Full Text Available 本研究利用「台灣教育長期追蹤資料庫」的一般分析能力與數學分析能力的四波調查結果，配合男、女學生樣本進行多群體多條追蹤資料的線性成長模式估計。在考慮重複觀測資料誤差項在不同時點的變異數非同質與不同時點間的共變數非獨立情況下，以及男、女學生的不同成長軌跡，將誤差項結構設為無限制結構，利用虛擬變項交互項法與虛擬變項多樣本法同時估計不同性別、不同能力的線性成長軌跡變化。由於全部追蹤資料樣本存在遺失值的情形，本研究以階層線性模式（hierarchical linear modeling, HLM）軟體對完整資料2,806位學生進行分析，其估計結果發現，在完整資料的兩條成長軌跡模式中，男、女學生誤差項共變異數矩陣結構相同，但線性成長軌跡不恆等。除此之外，本文並對競爭模式比較的結果在文章最後進行討論並提出相關的建議。 This paper demonstrates the data analysis of the repeated measures from the Taiwan Education Panel Survey (TEPS. Based on the four data waves on the TEPS, we consider two abilities (general and mathematic and two population groups (male and female students to construct a multi-group multivariate linear growth model. Because the two-group multivariate repeated measures belong to the different populations and the different research variables, the residual terms of linear growth models may imply heterogeneity of the error covariance structure. We treat the error covariance structure as an unrestricted structure to compare the various types of models. The results from the HLM on the complete data (2,806 students reveal that the male and female students in this study have the same error covariance structure but have distinct linear growth trajectories. In addition, comparisons of the competitive models and related suggestions are discussed in the results and conclusion
Yuldashev, Petr V; Ollivier, Sébastien; Karzova, Maria M; Khokhlova, Vera A; Blanc-Benon, Philippe
2017-12-01
Linear and nonlinear propagation of high amplitude acoustic pulses through a turbulent layer in air is investigated using a two-dimensional KZK-type (Khokhlov-Zabolotskaya-Kuznetsov) equation. Initial waves are symmetrical N-waves with shock fronts of finite width. A modified von Kármán spectrum model is used to generate random wind velocity fluctuations associated with the turbulence. Physical parameters in simulations correspond to previous laboratory scale experiments where N-waves with 1.4 cm wavelength propagated through a turbulence layer with the outer scale of about 16 cm. Mean value and standard deviation of peak overpressure and shock steepness, as well as cumulative probabilities to observe amplified peak overpressure and shock steepness, are analyzed. Nonlinear propagation effects are shown to enhance pressure level in random foci for moderate initial amplitudes of N-waves thus increasing the probability to observe highly peaked waveforms. Saturation of the pressure level is observed for stronger nonlinear effects. It is shown that in the linear propagation regime, the turbulence mainly leads to the smearing of shock fronts, thus decreasing the probability to observe high values of steepness, whereas nonlinear effects dramatically increase the probability to observe steep shocks.
Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg
2015-03-01
Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.
The statistical analysis of failure of a MEVATRON77 DX67 linear accelerator over a ten year period
Aoyama, H; Tahara, S; Uno, H; Kadohisa, S; Azuma, Y; Nakagiri, Y; Hiraki, Y
2003-01-01
A linear accelerator (linac) takes a leading role in radiation therapy. A linac consists of complicated main parts and systems and it is required that highly accurate operational procedures should be maintained. Operational failure occurs for various reasons. In this report, the failure occurrences of one linac over a ten year period were recorded and analyzed. The subject model was a MEVATRON77 DX67 (Siemens, Inc). The failure rate for each system, the form classification of the contents of failure, the operation situation at the time of failure, and the average performance life of the main parts were totaled. Moreover, the relation between the number of therapies that patients received (operating efficiency) and the failure rate within that number and the relation between environment (temperature and humidity) and the failure rate attributed to other systems were analyzed. In this report, irradiation interruption was also included with situations where treatment was unable to begin in total for the number o...
E. Baümler
2014-06-01
Full Text Available In this study the kinetics of oil extraction from partially dehulled safflower seeds under two moisture conditions (7 and 9% dry basis was investigated. The extraction assays were performed using a stirred batch system, thermostated at 50 ºC, using n-hexane as solvent. The data obtained were fitted to a modified diffusion model in order to represent the extraction kinetics. The model took into account a washing and a diffusive step. Fitting parameters were compared statistically for both moisture conditions. The oil yield increased with the extraction time in both cases, although the oil was released at different rates. A comparison of the parameters showed that both the portion extracted in the washing phase and the effective diffusion coefficient were moisture-dependent. The effective diffusivities were 2.81 10-12 and 8.06 10-13 m²s-1 for moisture contents of 7% and 9%, respectively.
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Ansar, A.B.; Osaki, Yasuhiro; Kazui, Hiroaki
2006-01-01
Statistical parametric mapping (SPM) was employed to investigate the regional decline in cerebral blood flow (rCBF) as measured by 99m Tc-hexamethyl propylene amine oxime (HMPAO) single photon emission computed tomography (SPECT) in mild Alzheimer's disease (AD). However, the role of the post reconstruction image processing on the interpretation of SPM, which detects rCBF pattern, has not been precisely studied. We performed 99m Tc-HMPAO SPECT in mild AD patients and analyzed the effect of linearization correction for washout of the tracer on the detectability of abnormal perfusion. Eleven mild AD (National Institute of Neurological and Communicative Disorders and National Institute of Radiological Sciences (NINCDS-ADRDA), male/female, 5/6; mean±SD age, 70.6±6.2 years; mean±SD mini-mental state examination score, 23.9±3.41; clinical dementia rating score, 1) and eleven normal control subjects (male/female, 4/7; mean±SD age, 66.8±8.4 years) were enrolled in this study. 99m Tc-HMPAO SPECT was performed with a four-head rotating gamma camera. We employed linearization uncorrected (LU) and linearization corrected (LC) images for the patients and controls. The pattern of hypoperfusion in mild AD on LU and LC images was detected by SPM99 applying the same image standardization and analytical parameters. A statistical inter image-group analysis (LU vs. LC) was also performed. Clear differences were observed between the interpretation of SPM with LU and LC images. Significant hypoperfusion in mild AD was found on the LU images in the left posterior cingulate gyrus, right precuneus, left hippocampus, left uncus, and left superior temporal gyrus (cluster level, corrected p 99m Tc-HMPAO SPECT with or without linearization correction, which should be carefully evaluated when interpreting the pattern of rCBF changes in mild Alzheimer's disease. (author)
Farooq Ahmed Arain
2012-01-01
Full Text Available The aim of this study was to develop a statistical model for the effect of RS (Rotor Speed, YT (Yarn Twist and YLD (Yarn Linear Density on production and quality characteristics of rotor spun yarn. Cotton yarns of 30, 35 and 40 tex were produced on rotor spinning machine at different rotor speeds (i.e. 70000, 80000, 90000 and 100000 rpm and with different twist levels (i.e. 450, 500, 550, 600 and 700 tpm. Yarn production (g/hr and quality characteristics were determined for all the experiments. Based on the results, models were developed using response surface regression on MINITAB�16 statistical tool. The developed models not only characterize the intricate relationships among the factors but may also be used to predict the yarn production and quality characteristics at any level of factors within the range of experimental values.
Tanaka, H.; Ohno, N.; Tsuji, Y.; Kajita, S.
2010-01-01
We have analyzed the 2D convective motion of coherent structures, which is associated with plasma blobs, under attached and detached plasma conditions of a linear divertor simulator, NAGDIS-II. Data analysis of probes and a fast-imaging camera by spatio-temporal correlation with three decomposition and proper orthogonal decomposition (POD) was carried out to determine the basic properties of coherent structures detached from a bulk plasma column. Under the attached plasma condition, the spatio-temporal correlation with three decomposition based on the probe measurement showed that two types of coherent structures with different sizes detached from the bulk plasma and the azimuthally localized structure radially propagated faster than the larger structure. Under the detached plasma condition, movies taken by the fast-imaging camera clearly showed the dynamics of a 2D spiral structure at peripheral regions of the bulk plasma; this dynamics caused the broadening of the plasma profile. The POD method was used for the data processing of the movies to obtain low-dimensional mode shapes. It was found that the m=1 and m=2 ring-shaped coherent structures were dominant. Comparison between the POD analysis of both the movie and the probe data suggested that the coherent structure could be detached from the bulk plasma mainly associated with the m=2 fluctuation. This phenomena could play an important role in the reduction of the particle and heat flux as well as the plasma recombination processes in plasma detachment (copyright 2010 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)
Modelling the Covariance Structure in Marginal Multivariate Count Models
Bonat, W. H.; Olivero, J.; Grande-Vega, M.
2017-01-01
The main goal of this article is to present a flexible statistical modelling framework to deal with multivariate count data along with longitudinal and repeated measures structures. The covariance structure for each response variable is defined in terms of a covariance link function combined...... be used to indicate whether there was statistical evidence of a decline in blue duikers and other species hunted during the study period. Determining whether observed drops in the number of animals hunted are indeed true is crucial to assess whether species depletion effects are taking place in exploited...... with a matrix linear predictor involving known matrices. In order to specify the joint covariance matrix for the multivariate response vector, the generalized Kronecker product is employed. We take into account the count nature of the data by means of the power dispersion function associated with the Poisson...
Amano, Ken-Ichi; Yoshidome, Takashi; Iwaki, Mitsuhiro; Suzuki, Makoto; Kinoshita, Masahiro
2010-07-28
We report a new progress in elucidating the mechanism of the unidirectional movement of a linear-motor protein (e.g., myosin) along a filament (e.g., F-actin). The basic concept emphasized here is that a potential field is entropically formed for the protein on the filament immersed in solvent due to the effect of the translational displacement of solvent molecules. The entropic potential field is strongly dependent on geometric features of the protein and the filament, their overall shapes as well as details of the polyatomic structures. The features and the corresponding field are judiciously adjusted by the binding of adenosine triphosphate (ATP) to the protein, hydrolysis of ATP into adenosine diphosphate (ADP)+Pi, and release of Pi and ADP. As the first step, we propose the following physical picture: The potential field formed along the filament for the protein without the binding of ATP or ADP+Pi to it is largely different from that for the protein with the binding, and the directed movement is realized by repeated switches from one of the fields to the other. To illustrate the picture, we analyze the spatial distribution of the entropic potential between a large solute and a large body using the three-dimensional integral equation theory. The solute is modeled as a large hard sphere. Two model filaments are considered as the body: model 1 is a set of one-dimensionally connected large hard spheres and model 2 is a double helical structure formed by two sets of connected large hard spheres. The solute and the filament are immersed in small hard spheres forming the solvent. The major findings are as follows. The solute is strongly confined within a narrow space in contact with the filament. Within the space there are locations with sharply deep local potential minima along the filament, and the distance between two adjacent locations is equal to the diameter of the large spheres constituting the filament. The potential minima form a ringlike domain in model 1
The statistical analysis of failure of a MEVATRON77 DX67 linear accelerator over a ten year period
Aoyama, Hideki; Inamura, Keiji; Tahara, Seiji; Uno, Hirofumi; Kadohisa, Shigefumi; Azuma, Yoshiharu; Nakagiri, Yoshitada; Hiraki, Yoshio
2003-01-01
A linear accelerator (linac) takes a leading role in radiation therapy. A linac consists of complicated main parts and systems and it is required that highly accurate operational procedures should be maintained. Operational failure occurs for various reasons. In this report, the failure occurrences of one linac over a ten year period were recorded and analyzed. The subject model was a MEVATRON77 DX67 (Siemens, Inc). The failure rate for each system, the form classification of the contents of failure, the operation situation at the time of failure, and the average performance life of the main parts were totaled. Moreover, the relation between the number of therapies that patients received (operating efficiency) and the failure rate within that number and the relation between environment (temperature and humidity) and the failure rate attributed to other systems were analyzed. In this report, irradiation interruption was also included with situations where treatment was unable to begin in total for the number of failure cases. The cases of failure were classified into three kinds, irradiation possible: irradiation capacity decreased, and: irradiation impossible. Consequently, the total failure number of cases for ten years and eight months was 1,036, and the number of cases/rate of each kind were irradiation possible: 49/4.7%, irradiation capacity: 919/88.7%, and irradiation impossible: 68/6.6%. In the classification according to the system, the acceleration section accounted for 59.0% and the pulse section 23.2% of the total number of failure cases. Every year, an operating efficiency of 95% or higher was maintained. The average lives of a thyratron, a klystron, and radio frequency (RF) driver were 4,886 hours, 17,383 hours, and 5,924 hours respectively. Moreover, although analysis of the relation between the number of therapies performed (or operating time) and the number of failures for each main machine part was observed, the tendency was not to associate them
Searle, Shayle R
2012-01-01
This 1971 classic on linear models is once again available--as a Wiley Classics Library Edition. It features material that can be understood by any statistician who understands matrix algebra and basic statistical methods.
Xu, Yangyang; Cai, Hao; Cao, Gang; Duan, Yu; Pei, Ke; Tu, Sicong; Zhou, Jia; Xie, Li; Sun, Dongdong; Zhao, Jiayu; Liu, Jing; Wang, Xiaoqi; Shen, Lin
2018-04-15
Baizhu Shaoyao San (BSS) is a famous traditional Chinese medicinal formula widely used for the treatment of painful diarrhea, intestinal inflammation, and diarrhea-predominant irritable bowel syndrome. According to clinical medication, three medicinal herbs (Atractylodis Macrocephalae Rhizoma, Paeoniae Radix Alba, and Citri Reticulatae Pericarpium) included in BSS must be processed using some specific methods of stir-frying. On the basis of the classical theories of traditional Chinese medicine, the therapeutic effects of BSS would be significantly enhanced after processing. Generally, the changes of curative effects mainly result from the variations of inside chemical basis caused by the processing procedure. To find out the corresponding changes of chemical compositions in BSS after processing and to elucidate the material basis of the changed curative effects, an optimized ultra-high-performance liquid chromatography-quadrupole/time-of-flight mass spectrometry in positive and negative ion modes coupled with multivariate statistical analyses were developed. As a result, a total of 186 compounds were ultimately identified in crude and processed BSS, in which 62 marker compounds with significant differences between crude and processed BSS were found by principal component analysis and t-test. Compared with crude BSS, the contents of 23 compounds were remarkably decreased and the contents of 39 compounds showed notable increase in processed BSS. The transformation mechanisms of some changed compounds were appropriately inferred from the results. Furthermore, compounds with extremely significant differences might strengthen the effects of the whole herbal formula. Copyright © 2018 Elsevier B.V. All rights reserved.
Marques, R.; Amaral, P.; Zêzere, J. L.; Queiroz, G.; Goulart, C.
2009-04-01
Slope instability research and susceptibility mapping is a fundamental component of hazard assessment and is of extreme importance for risk mitigation, land-use management and emergency planning. Landslide susceptibility zonation has been actively pursued during the last two decades and several methodologies are still being improved. Among all the methods presented in the literature, indirect quantitative probabilistic methods have been extensively used. In this work different linear probabilistic methods, both bi-variate and multi-variate (Informative Value, Fuzzy Logic, Weights of Evidence and Logistic Regression), were used for the computation of the spatial probability of landslide occurrence, using the pixel as mapping unit. The methods used are based on linear relationships between landslides and 9 considered conditioning factors (altimetry, slope angle, exposition, curvature, distance to streams, wetness index, contribution area, lithology and land-use). It was assumed that future landslides will be conditioned by the same factors as past landslides in the study area. The presented work was developed for Ribeira Quente Valley (S. Miguel Island, Azores), a study area of 9,5 km2, mainly composed of volcanic deposits (ash and pumice lapilli) produced by explosive eruptions in Furnas Volcano. This materials associated to the steepness of the slopes (38,9% of the area has slope angles higher than 35°, reaching a maximum of 87,5°), make the area very prone to landslide activity. A total of 1.495 shallow landslides were mapped (at 1:5.000 scale) and included in a GIS database. The total affected area is 401.744 m2 (4,5% of the study area). Most slope movements are translational slides frequently evolving into debris-flows. The landslides are elongated, with maximum length generally equivalent to the slope extent, and their width normally does not exceed 25 m. The failure depth rarely exceeds 1,5 m and the volume is usually smaller than 700 m3. For modelling
Doherty, W.
2013-01-01
A nebulizer-centric response function model of the analytical inductively coupled argon plasma ion source was used to investigate the statistical frequency distributions and noise reduction factors of simultaneously measured flicker noise limited isotope ion signals and their ratios. The response function model was extended by assuming i) a single gaussian distributed random noise source (nebulizer gas pressure fluctuations) and ii) the isotope ion signal response is a parabolic function of the nebulizer gas pressure. Model calculations of ion signal and signal ratio histograms were obtained by applying the statistical method of translation to the non-linear response function model of the plasma. Histograms of Ni, Cu, Pr, Tl and Pb isotope ion signals measured using a multi-collector plasma mass spectrometer were, without exception, negative skew. Histograms of the corresponding isotope ratios of Ni, Cu, Tl and Pb were either positive or negative skew. There was a complete agreement between the measured and model calculated histogram skew properties. The nebulizer-centric response function model was also used to investigate the effect of non-linear response functions on the effectiveness of noise cancellation by signal division. An alternative noise correction procedure suitable for parabolic signal response functions was derived and applied to measurements of isotope ratios of Cu, Ni, Pb and Tl. The largest noise reduction factors were always obtained when the non-linearity of the response functions was taken into account by the isotope ratio calculation. Possible applications of the nebulizer-centric response function model to other types of analytical instrumentation, large amplitude signal noise sources (e.g., lasers, pumped nebulizers) and analytical error in isotope ratio measurements by multi-collector plasma mass spectrometry are discussed. - Highlights: ► Isotope ion signal noise is modelled as a parabolic transform of a gaussian variable. ► Flicker
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi
2014-01-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both
Hassen, Imen; Hamzaoui-Azaza, Fadoua; Bouhlila, Rachida
2016-03-01
Groundwater plays a dominant role in arid regions; it is among the most available water resources in Tunisia. Located in northwestern Tunisia, Oum Ali-Thelepte is a deep Miocene sedimentary aquifer, where groundwater is the most important source of water supply. The aim of the study is to investigate the hydrochemical processes leading to mineralization and to assess water quality with respect to agriculture and drinking for a better management of groundwater resources. To achieve such objectives, water analysis was carried out on 16 groundwater samples collected during January-February 2014. Stable isotopes and 26 hydrochemical parameters were examined. The interpretation of these analytical data showed that the concentrations of major and trace elements were within the permissible level for human use. The distribution of mineral processes in this aquifer was identified using conventional classification techniques, suggesting that the water facies gradually changes from Ca-HCO3 to Mg-SO4 type and are controlled by water-rock interaction. These results were endorsed using multivariate statistical methods such as principal component analysis and cluster analysis. The sustainability of groundwater for drinking and irrigation was assessed based on the water quality index (WQI) and on Wilcox and Richards's diagrams. This aquifer has been classified as "excellent water" serving good irrigation in the area. As for the stable isotope, the measurements showed that groundwater samples lay between global meteoric water line (GMWL) and LMWL; hence, this arrangement signifies that the recharge of the Oum Ali-Thelepte aquifer is ensured by rainwater infiltration through mountains in the border of the aquifer without evaporation effects.
Chan, Kar-Man; Yue, Grace Gar-Lee; Li, Ping; Wong, Eric Chun-Wai; Lee, Julia Kin-Ming; Kennelly, Edward J; Lau, Clara Bik-San
2017-03-03
According to Chinese Pharmacopoeia 2015 edition, Ganoderma (Lingzhi) is a species complex that comprise of Ganoderma lucidum and Ganoderma sinense. The bioactivity and chemical composition of G. lucidium had been studied extensively, and it was shown to possess antitumor activities in pharmacological studies. In contrast, G. sinense has not been studied in great detail. Our previous studies found that the stipe of G. sinense exhibited more potent antitumor activity than the pileus. To identify the antitumor compounds in the stipe of G. sinense, we studied its chemical components by merging the bioactivity results with liquid chromatography-mass spectrometry-based chemometrics. The stipe of G. sinense was extracted with water, followed by ethanol precipitation and liquid-liquid partition. The resulting residue was fractionated using column chromatography. The antitumor activity of these fractions were analysed using MTT assay in murine breast tumor 4T1 cells, and their chemical components were studied using the LC-QTOF-MS with multivariate statistical tools. The chemometric and MS/MS analysis correlated bioactivity with five known cytotoxic compounds, 4-hyroxyphenylacetate, 9-oxo-(10E,12E)-octadecadienoic acid, 3-phenyl-2-propenoic acid, 13-oxo-(9E,11E)-octadecadienoic acid and lingzhine C, from the stipe of G. sinense. To the best of our knowledge, 4-hyroxyphenylacetate, 3-phenyl-2-propenoic acid and lingzhine C are firstly reported to be found in G. sinense. These five compounds will be investigated for their antitumor activities in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Alexandrowicz, Rainer W; Jahn, Rebecca; Friedrich, Fabian; Unger, Anne
2016-06-01
Various studies have shown that caregiving relatives of schizophrenic patients are at risk of suffering from depression. These studies differ with respect to the applied statistical methods, which could influence the findings. Therefore, the present study analyzes to which extent different methods may cause differing results. The present study contrasts by means of one data set the results of three different modelling approaches, Rasch Modelling (RM), Structural Equation Modelling (SEM), and Linear Regression Modelling (LRM). The results of the three models varied considerably, reflecting the different assumptions of the respective models. Latent trait models (i. e., RM and SEM) generally provide more convincing results by correcting for measurement error and the RM specifically proves superior for it treats ordered categorical data most adequately.
Ranaivo Nomenjanahary, F.; Rakoto, H.; Ratsimbazafy, J.B.
1994-08-01
This paper is concerned with resistivity sounding measurements performed from single site (vertical sounding) or from several sites (profiles) within a bounded area. The objective is to present an accurate information about the study area and to estimate the likelihood of the produced quantitative models. The achievement of this objective obviously requires quite relevant data and processing methods. It also requires interpretation methods which should take into account the probable effect of an heterogeneous structure. In front of such difficulties, the interpretation of resistivity sounding data inevitably involves the use of inversion methods. We suggest starting the interpretation in simple situation (1-D approximation), and using the rough but correct model obtained as an a-priori model for any more refined interpretation. Related to this point of view, special attention should be paid for the inverse problem applied to the resistivity sounding data. This inverse problem is nonlinear, while linearity inherent in the functional response used to describe the physical experiment. Two different approaches are used to build an approximate but higher dimensional inversion of geoelectrical data: the linear approach and the bayesian statistical approach. Some illustrations of their application in resistivity sounding data acquired at Tritrivakely volcanic lake (single site) and at Mahitsy area (several sites) will be given. (author). 28 refs, 7 figs
Simplicial band depth for multivariate functional data
Ló pez-Pintado, Sara; Sun, Ying; Lin, Juan K.; Genton, Marc G.
2014-01-01
sample of curves. Based on these depths, a sample of multivariate curves can be ordered from the center outward and order statistics can be defined. Properties of the proposed depths, such as invariance and consistency, can be established. A simulation
An Introduction to Applied Multivariate Analysis
Raykov, Tenko
2008-01-01
Focuses on the core multivariate statistics topics which are of fundamental relevance for its understanding. This book emphasis on the topics that are critical to those in the behavioral, social, and educational sciences.
Joint density of eigenvalues in spiked multivariate models.
Dharmawansa, Prathapasinghe; Johnstone, Iain M
2014-01-01
The classical methods of multivariate analysis are based on the eigenvalues of one or two sample covariance matrices. In many applications of these methods, for example to high dimensional data, it is natural to consider alternative hypotheses which are a low rank departure from the null hypothesis. For rank one alternatives, this note provides a representation for the joint eigenvalue density in terms of a single contour integral. This will be of use for deriving approximate distributions for likelihood ratios and 'linear' statistics used in testing.
Pestman, Wiebe R
2009-01-01
This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.
Multivariate return periods of sea storms for coastal erosion risk assessment
S. Corbella
2012-08-01
Full Text Available The erosion of a beach depends on various storm characteristics. Ideally, the risk associated with a storm would be described by a single multivariate return period that is also representative of the erosion risk, i.e. a 100 yr multivariate storm return period would cause a 100 yr erosion return period. Unfortunately, a specific probability level may be associated with numerous combinations of storm characteristics. These combinations, despite having the same multivariate probability, may cause very different erosion outcomes. This paper explores this ambiguity problem in the context of copula based multivariate return periods and using a case study at Durban on the east coast of South Africa. Simulations were used to correlate multivariate return periods of historical events to return periods of estimated storm induced erosion volumes. In addition, the relationship of the most-likely design event (Salvadori et al., 2011 to coastal erosion was investigated. It was found that the multivariate return periods for wave height and duration had the highest correlation to erosion return periods. The most-likely design event was found to be an inadequate design method in its current form. We explore the inclusion of conditions based on the physical realizability of wave events and the use of multivariate linear regression to relate storm parameters to erosion computed from a process based model. Establishing a link between storm statistics and erosion consequences can resolve the ambiguity between multivariate storm return periods and associated erosion return periods.
Improved multivariate polynomial factoring algorithm
Wang, P.S.
1978-01-01
A new algorithm for factoring multivariate polynomials over the integers based on an algorithm by Wang and Rothschild is described. The new algorithm has improved strategies for dealing with the known problems of the original algorithm, namely, the leading coefficient problem, the bad-zero problem and the occurrence of extraneous factors. It has an algorithm for correctly predetermining leading coefficients of the factors. A new and efficient p-adic algorithm named EEZ is described. Bascially it is a linearly convergent variable-by-variable parallel construction. The improved algorithm is generally faster and requires less store then the original algorithm. Machine examples with comparative timing are included
Statistical power analysis for the behavioral sciences
Cohen, Jacob
1988-01-01
.... A chapter has been added for power analysis in set correlation and multivariate methods (Chapter 10). Set correlation is a realization of the multivariate general linear model, and incorporates the standard multivariate methods...
Multivariable Feedback Control of Nuclear Reactors
Rune Moen
1982-07-01
Full Text Available Multivariable feedback control has been adapted for optimal control of the spatial power distribution in nuclear reactor cores. Two design techniques, based on the theory of automatic control, were developed: the State Variable Feedback (SVF is an application of the linear optimal control theory, and the Multivariable Frequency Response (MFR is based on a generalization of the traditional frequency response approach to control system design.
Application of multivariate splines to discrete mathematics
Xu, Zhiqiang
2005-01-01
Using methods developed in multivariate splines, we present an explicit formula for discrete truncated powers, which are defined as the number of non-negative integer solutions of linear Diophantine equations. We further use the formula to study some classical problems in discrete mathematics as follows. First, we extend the partition function of integers in number theory. Second, we exploit the relation between the relative volume of convex polytopes and multivariate truncated powers and giv...
Ana Paula Almeida Bertossi
2013-10-01
Full Text Available Multivariate statistics techniques (Principal Component Analysis and Cluster Analysis were employed to select the most important parameters that explain water quality variability at a rural watershed in the state of Espírito Santo (Brazil. In addition to group the waters studied for the similarity of features selected to verify the effect of type of soil cover (agriculture, livestock, forest and urban, water resource (surface and underground and sampling period (rainy and dry seasons. Nineteen physico-chemical parameters of water quality were analyzed: pH, electrical conductivity, total solids, total dissolved solids, total suspended solids, turbidity, biochemical oxygen demand (BOD, ammoniacal nitrogen, nitrate, nitrite, total phosphorous, Ca, Mg, Fe, Na, K, Zn, Cu and total coliform. Application of Principal Component Analysis reduced the 19 parameters to three components that explained 87.53% of the total variance of data set. Water quality parameters that best explained variability of data were: electrical conductivity, total solids, total dissolved solids, turbidity, BOD, nitrate, Ca, Mg, and Na. Application of Cluster Analysis showed four different groups of water quality that differed in concentration of physicochemical characteristics and the type of water resource study, since the collection periods and the type of soil cover did not influence the segregation of groups formed. No presente trabalho empregaram-se técnicas de Estatística Multivariada (Análise de Componentes Principais e Análise de Agrupamento Hierárquico com o objetivo de selecionar as características físico-químicas mais importantes para explicar a variabilidade da qualidade das águas de uma sub-bacia hidrográfica rural no Sul do Estado do Espírito Santo, além de agrupar as águas estudadas quanto à similaridade das características selecionadas para verificar o efeito do tipo de cobertura do solo (agrícola, pecuário, florestal e urbano, de recurso h
Multivariable control in nuclear power stations
Parent, M.; McMorran, P.D.
1982-11-01
Multivariable methods have the potential to improve the control of large systems such as nuclear power stations. Linear-quadratic optimal control is a multivariable method based on the minimization of a cost function. A related technique leads to the Kalman filter for estimation of plant state from noisy measurements. A design program for optimal control and Kalman filtering has been developed as part of a computer-aided design package for multivariable control systems. The method is demonstrated on a model of a nuclear steam generator, and simulated results are presented
Continuous multivariate exponential extension
Block, H.W.
1975-01-01
The Freund-Weinman multivariate exponential extension is generalized to the case of nonidentically distributed marginal distributions. A fatal shock model is given for the resulting distribution. Results in the bivariate case and the concept of constant multivariate hazard rate lead to a continuous distribution related to the multivariate exponential distribution (MVE) of Marshall and Olkin. This distribution is shown to be a special case of the extended Freund-Weinman distribution. A generalization of the bivariate model of Proschan and Sullo leads to a distribution which contains both the extended Freund-Weinman distribution and the MVE
Methods of Multivariate Analysis
Rencher, Alvin C
2012-01-01
Praise for the Second Edition "This book is a systematic, well-written, well-organized text on multivariate analysis packed with intuition and insight . . . There is much practical wisdom in this book that is hard to find elsewhere."-IIE Transactions Filled with new and timely content, Methods of Multivariate Analysis, Third Edition provides examples and exercises based on more than sixty real data sets from a wide variety of scientific fields. It takes a "methods" approach to the subject, placing an emphasis on how students and practitioners can employ multivariate analysis in real-life sit
Silvennoinen, Annastiina; Teräsvirta, Timo
This article contains a review of multivariate GARCH models. Most common GARCH models are presented and their properties considered. This also includes nonparametric and semiparametric models. Existing specification and misspecification tests are discussed. Finally, there is an empirical example...
Multivariate Time Series Search
National Aeronautics and Space Administration — Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical...
Essentials of multivariate data analysis
Spencer, Neil H
2013-01-01
""… this text provides an overview at an introductory level of several methods in multivariate data analysis. It contains in-depth examples from one data set woven throughout the text, and a free [Excel] Add-In to perform the analyses in Excel, with step-by-step instructions provided for each technique. … could be used as a text (possibly supplemental) for courses in other fields where researchers wish to apply these methods without delving too deeply into the underlying statistics.""-The American Statistician, February 2015
Introduction to applied Bayesian statistics and estimation for social scientists
Lynch, Scott M
2007-01-01
""Introduction to Applied Bayesian Statistics and Estimation for Social Scientists"" covers the complete process of Bayesian statistical analysis in great detail from the development of a model through the process of making statistical inference. The key feature of this book is that it covers models that are most commonly used in social science research - including the linear regression model, generalized linear models, hierarchical models, and multivariate regression models - and it thoroughly develops each real-data example in painstaking detail.The first part of the book provides a detailed
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Foundations of linear and generalized linear models
Agresti, Alan
2015-01-01
A valuable overview of the most important ideas and results in statistical analysis Written by a highly-experienced author, Foundations of Linear and Generalized Linear Models is a clear and comprehensive guide to the key concepts and results of linear statistical models. The book presents a broad, in-depth overview of the most commonly used statistical models by discussing the theory underlying the models, R software applications, and examples with crafted models to elucidate key ideas and promote practical model building. The book begins by illustrating the fundamentals of linear models,
Multivariate Birkhoff interpolation
Lorentz, Rudolph A
1992-01-01
The subject of this book is Lagrange, Hermite and Birkhoff (lacunary Hermite) interpolation by multivariate algebraic polynomials. It unifies and extends a new algorithmic approach to this subject which was introduced and developed by G.G. Lorentz and the author. One particularly interesting feature of this algorithmic approach is that it obviates the necessity of finding a formula for the Vandermonde determinant of a multivariate interpolation in order to determine its regularity (which formulas are practically unknown anyways) by determining the regularity through simple geometric manipulations in the Euclidean space. Although interpolation is a classical problem, it is surprising how little is known about its basic properties in the multivariate case. The book therefore starts by exploring its fundamental properties and its limitations. The main part of the book is devoted to a complete and detailed elaboration of the new technique. A chapter with an extensive selection of finite elements follows as well a...
Statistical analysis of management data
Gatignon, Hubert
2013-01-01
This book offers a comprehensive approach to multivariate statistical analyses. It provides theoretical knowledge of the concepts underlying the most important multivariate techniques and an overview of actual applications.
Cross-covariance functions for multivariate geostatistics
Genton, Marc G.
2015-05-01
Continuously indexed datasets with multiple variables have become ubiquitous in the geophysical, ecological, environmental and climate sciences, and pose substantial analysis challenges to scientists and statisticians. For many years, scientists developed models that aimed at capturing the spatial behavior for an individual process; only within the last few decades has it become commonplace to model multiple processes jointly. The key difficulty is in specifying the cross-covariance function, that is, the function responsible for the relationship between distinct variables. Indeed, these cross-covariance functions must be chosen to be consistent with marginal covariance functions in such a way that the second-order structure always yields a nonnegative definite covariance matrix. We review the main approaches to building cross-covariance models, including the linear model of coregionalization, convolution methods, the multivariate Matérn and nonstationary and space-time extensions of these among others. We additionally cover specialized constructions, including those designed for asymmetry, compact support and spherical domains, with a review of physics-constrained models. We illustrate select models on a bivariate regional climate model output example for temperature and pressure, along with a bivariate minimum and maximum temperature observational dataset; we compare models by likelihood value as well as via cross-validation co-kriging studies. The article closes with a discussion of unsolved problems. © Institute of Mathematical Statistics, 2015.
Cross-covariance functions for multivariate geostatistics
Genton, Marc G.; Kleiber, William
2015-01-01
Continuously indexed datasets with multiple variables have become ubiquitous in the geophysical, ecological, environmental and climate sciences, and pose substantial analysis challenges to scientists and statisticians. For many years, scientists developed models that aimed at capturing the spatial behavior for an individual process; only within the last few decades has it become commonplace to model multiple processes jointly. The key difficulty is in specifying the cross-covariance function, that is, the function responsible for the relationship between distinct variables. Indeed, these cross-covariance functions must be chosen to be consistent with marginal covariance functions in such a way that the second-order structure always yields a nonnegative definite covariance matrix. We review the main approaches to building cross-covariance models, including the linear model of coregionalization, convolution methods, the multivariate Matérn and nonstationary and space-time extensions of these among others. We additionally cover specialized constructions, including those designed for asymmetry, compact support and spherical domains, with a review of physics-constrained models. We illustrate select models on a bivariate regional climate model output example for temperature and pressure, along with a bivariate minimum and maximum temperature observational dataset; we compare models by likelihood value as well as via cross-validation co-kriging studies. The article closes with a discussion of unsolved problems. © Institute of Mathematical Statistics, 2015.
Serdobolskii, Vadim Ivanovich
2007-01-01
This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Chandra, Preeti; Kannujia, Rekha; Saxena, Ankita; Srivastava, Mukesh; Bahadur, Lal; Pal, Mahesh; Singh, Bhim Pratap; Kumar Ojha, Sanjeev; Kumar, Brijesh
2016-09-10
An ultra-high performance liquid chromatography electrospray ionization tandem mass spectrometry method has been developed and validated for simultaneous quantification of six major bioactive compounds in five varieties of Withania somnifera in various plant parts (leaf, stem and root). The analysis was accomplished on Waters ACQUITY UPLC BEH C18 column with linear gradient elution of water/formic acid (0.1%) and acetonitrile at a flow rate of 0.3mLmin(-1). The proposed method was validated with acceptable linearity (r(2), 0.9989-0.9998), precision (RSD, 0.16-2.01%), stability (RSD, 1.04-1.62%) and recovery (RSD ≤2.45%), under optimum conditions. The method was also successfully applied for the simultaneous determination of six marker compounds in twenty-six marketed formulations. Hierarchical cluster analysis and principal component analysis were applied to discriminate these twenty-six batches based on characteristics of the bioactive compounds. The results indicated that this method is advance, rapid, sensitive and suitable to reveal the quality of Withania somnifera and also capable of performing quality evaluation of polyherbal formulations having similar markers/raw herbs. Copyright © 2016 Elsevier B.V. All rights reserved.
A MULTIVARIATE WEIBULL DISTRIBUTION
Cheng Lee
2010-07-01
Full Text Available A multivariate survival function of Weibull Distribution is developed by expanding the theorem by Lu and Bhattacharyya. From the survival function, the probability density function, the cumulative probability function, the determinant of the Jacobian Matrix, and the general moment are derived.
Barndorff-Nielsen, Ole; Hansen, Peter Reinhard; Lunde, Asger
We propose a multivariate realised kernel to estimate the ex-post covariation of log-prices. We show this new consistent estimator is guaranteed to be positive semi-definite and is robust to measurement noise of certain types and can also handle non-synchronous trading. It is the first estimator...
Multivariate Bonferroni-type inequalities theory and applications
Chen, John
2014-01-01
Multivariate Bonferroni-Type Inequalities: Theory and Applications presents a systematic account of research discoveries on multivariate Bonferroni-type inequalities published in the past decade. The emergence of new bounding approaches pushes the conventional definitions of optimal inequalities and demands new insights into linear and Fréchet optimality. The book explores these advances in bounding techniques with corresponding innovative applications. It presents the method of linear programming for multivariate bounds, multivariate hybrid bounds, sub-Markovian bounds, and bounds using Hamil
Nielsen, Allan Aasbjerg; Conradsen, Knut
This paper introduces a new orthogonal transformation, the multivariate alteration detection (MAD) transformation, based on an established multivariate statistical technique canonical correlation analysis. The theory for canonical correlation analysis is sketched and a result necessary...... for the definition of the MAD transformation is proven. As opposed to traditional univariate change detection schemes our scheme transforms two sets of multivariate observations (e.g. two multispectral satellite images covering the same geographical area acquired at different points in time) into a difference...... between two linear combinations of the original variables explaining maximal change (i.e. the difference explaining maximal variance) in all variables simultaneously. The MAD transformation is invariant to linear scaling. The MAD transformation can be used iteratively. First, it can be used to detect...
H. Apel
2018-04-01
Full Text Available The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien Shan and Pamir and Altai mountains. During the summer months the snow-melt- and glacier-melt-dominated river discharge originating in the mountains provides the main water resource available for agricultural production, but also for storage in reservoirs for energy generation during the winter months. Thus a reliable seasonal forecast of the water resources is crucial for sustainable management and planning of water resources. In fact, seasonal forecasts are mandatory tasks of all national hydro-meteorological services in the region. In order to support the operational seasonal forecast procedures of hydro-meteorological services, this study aims to develop a generic tool for deriving statistical forecast models of seasonal river discharge based solely on observational records. The generic model structure is kept as simple as possible in order to be driven by meteorological and hydrological data readily available at the hydro-meteorological services, and to be applicable for all catchments in the region. As snow melt dominates summer runoff, the main meteorological predictors for the forecast models are monthly values of winter precipitation and temperature, satellite-based snow cover data, and antecedent discharge. This basic predictor set was further extended by multi-monthly means of the individual predictors, as well as composites of the predictors. Forecast models are derived based on these predictors as linear combinations of up to four predictors. A user-selectable number of the best models is extracted automatically by the developed model fitting algorithm, which includes a test for robustness by a leave-one-out cross-validation. Based on the cross-validation the predictive uncertainty was quantified for every prediction model. Forecasts of the mean seasonal discharge of the period April to September are derived
Apel, Heiko; Abdykerimova, Zharkinay; Agalhanova, Marina; Baimaganbetov, Azamat; Gavrilenko, Nadejda; Gerlitz, Lars; Kalashnikova, Olga; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Gafurov, Abror
2018-04-01
The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien Shan and Pamir and Altai mountains. During the summer months the snow-melt- and glacier-melt-dominated river discharge originating in the mountains provides the main water resource available for agricultural production, but also for storage in reservoirs for energy generation during the winter months. Thus a reliable seasonal forecast of the water resources is crucial for sustainable management and planning of water resources. In fact, seasonal forecasts are mandatory tasks of all national hydro-meteorological services in the region. In order to support the operational seasonal forecast procedures of hydro-meteorological services, this study aims to develop a generic tool for deriving statistical forecast models of seasonal river discharge based solely on observational records. The generic model structure is kept as simple as possible in order to be driven by meteorological and hydrological data readily available at the hydro-meteorological services, and to be applicable for all catchments in the region. As snow melt dominates summer runoff, the main meteorological predictors for the forecast models are monthly values of winter precipitation and temperature, satellite-based snow cover data, and antecedent discharge. This basic predictor set was further extended by multi-monthly means of the individual predictors, as well as composites of the predictors. Forecast models are derived based on these predictors as linear combinations of up to four predictors. A user-selectable number of the best models is extracted automatically by the developed model fitting algorithm, which includes a test for robustness by a leave-one-out cross-validation. Based on the cross-validation the predictive uncertainty was quantified for every prediction model. Forecasts of the mean seasonal discharge of the period April to September are derived every month from
The value of multivariate model sophistication
Rombouts, Jeroen; Stentoft, Lars; Violante, Francesco
2014-01-01
We assess the predictive accuracies of a large number of multivariate volatility models in terms of pricing options on the Dow Jones Industrial Average. We measure the value of model sophistication in terms of dollar losses by considering a set of 444 multivariate models that differ in their spec....... In addition to investigating the value of model sophistication in terms of dollar losses directly, we also use the model confidence set approach to statistically infer the set of models that delivers the best pricing performances.......We assess the predictive accuracies of a large number of multivariate volatility models in terms of pricing options on the Dow Jones Industrial Average. We measure the value of model sophistication in terms of dollar losses by considering a set of 444 multivariate models that differ...