Applied multivariate statistical analysis
Härdle, Wolfgang Karl
2015-01-01
Focusing on high-dimensional applications, this 4th edition presents the tools and concepts used in multivariate data analysis in a style that is also accessible for non-mathematicians and practitioners. It surveys the basic principles and emphasizes both exploratory and inferential statistics; a new chapter on Variable Selection (Lasso, SCAD and Elastic Net) has also been added. All chapters include practical exercises that highlight applications in different multivariate data analysis fields: in quantitative financial studies, where the joint dynamics of assets are observed; in medicine, where recorded observations of subjects in different locations form the basis for reliable diagnoses and medication; and in quantitative marketing, where consumers’ preferences are collected in order to construct models of consumer behavior. All of these examples involve high to ultra-high dimensions and represent a number of major fields in big data analysis. The fourth edition of this book on Applied Multivariate ...
Multivariate Statistical Process Control
DEFF Research Database (Denmark)
Kulahci, Murat
2013-01-01
As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control (SPC) and monitoring for which the aim is to iden......As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control (SPC) and monitoring for which the aim...... is to identify “out-of-control” state of a process using control charts in order to reduce the excessive variation caused by so-called assignable causes. In practice, the most common method of monitoring multivariate data is through a statistic akin to the Hotelling’s T2. For high dimensional data with excessive...... amount of cross correlation, practitioners are often recommended to use latent structures methods such as Principal Component Analysis to summarize the data in only a few linear combinations of the original variables that capture most of the variation in the data. Applications of these control charts...
Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
A primer of multivariate statistics
Harris, Richard J
2014-01-01
Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why
Applied multivariate statistics with R
Zelterman, Daniel
2015-01-01
This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...
Aspects of multivariate statistical theory
Muirhead, Robb J
2009-01-01
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "". . . the wealth of material on statistics concerning the multivariate normal distribution is quite exceptional. As such it is a very useful source of information for the general statistician and a must for anyone wanting to pen
Multivariate Statistical Process Control Charts: An Overview
Bersimis, Sotiris; Psarakis, Stelios; Panaretos, John
2006-01-01
In this paper we discuss the basic procedures for the implementation of multivariate statistical process control via control charting. Furthermore, we review multivariate extensions for all kinds of univariate control charts, such as multivariate Shewhart-type control charts, multivariate CUSUM control charts and multivariate EWMA control charts. In addition, we review unique procedures for the construction of multivariate control charts, based on multivariate statistical techniques such as p...
Multivariate statistical methods a primer
Manly, Bryan FJ
2004-01-01
THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o
A multivariate rank test for comparing mass size distributions
Lombard, F.
2012-04-01
Particle size analyses of a raw material are commonplace in the mineral processing industry. Knowledge of particle size distributions is crucial in planning milling operations to enable an optimum degree of liberation of valuable mineral phases, to minimize plant losses due to an excess of oversize or undersize material or to attain a size distribution that fits a contractual specification. The problem addressed in the present paper is how to test the equality of two or more underlying size distributions. A distinguishing feature of these size distributions is that they are not based on counts of individual particles. Rather, they are mass size distributions giving the fractions of the total mass of a sampled material lying in each of a number of size intervals. As such, the data are compositional in nature, using the terminology of Aitchison [1] that is, multivariate vectors the components of which add to 100%. In the literature, various versions of Hotelling\\'s T 2 have been used to compare matched pairs of such compositional data. In this paper, we propose a robust test procedure based on ranks as a competitor to Hotelling\\'s T 2. In contrast to the latter statistic, the power of the rank test is not unduly affected by the presence of outliers or of zeros among the data. © 2012 Copyright Taylor and Francis Group, LLC.
Multivariate statistics exercises and solutions
Härdle, Wolfgang Karl
2015-01-01
The authors present tools and concepts of multivariate data analysis by means of exercises and their solutions. The first part is devoted to graphical techniques. The second part deals with multivariate random variables and presents the derivation of estimators and tests for various practical situations. The last part introduces a wide variety of exercises in applied multivariate data analysis. The book demonstrates the application of simple calculus and basic multivariate methods in real life situations. It contains altogether more than 250 solved exercises which can assist a university teacher in setting up a modern multivariate analysis course. All computer-based exercises are available in the R language. All R codes and data sets may be downloaded via the quantlet download center www.quantlet.org or via the Springer webpage. For interactive display of low-dimensional projections of a multivariate data set, we recommend GGobi.
Multivariate statistical methods a first course
Marcoulides, George A
2014-01-01
Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Multivariate analysis: A statistical approach for computations
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Random matrix theory and multivariate statistics
Diaz-Garcia, Jose A.; Jáimez, Ramon Gutiérrez
2009-01-01
Some tools and ideas are interchanged between random matrix theory and multivariate statistics. In the context of the random matrix theory, classes of spherical and generalised Wishart random matrix ensemble, containing as particular cases the classical random matrix ensembles, are proposed. Some properties of these classes of ensemble are analysed. In addition, the random matrix ensemble approach is extended and a unified theory proposed for the study of distributions for real normed divisio...
Quality Evaluation Based on Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
Shen Yin
2013-01-01
Full Text Available Quality prediction models are constructed based on multivariate statistical methods, including ordinary least squares regression (OLSR, principal component regression (PCR, partial least squares regression (PLSR, and modified partial least squares regression (MPLSR. The prediction model constructed by MPLSR achieves superior results, compared with the other three methods from both aspects of fitting efficiency and prediction ability. Based on it, further research is dedicated to selecting key variables to directly predict the product quality with satisfactory performance. The prediction models presented are more efficient than tradition ones and can be useful to support human experts in the evaluation and classification of the product quality. The effectiveness of the quality prediction models is finally illustrated and verified based on the practical data set of the red wine.
Statistical inference of Minimum Rank Factor Analysis
Shapiro, A; Ten Berge, JMF
For any given number of factors, Minimum Rank Factor Analysis yields optimal communalities for an observed covariance matrix in the sense that the unexplained common variance with that number of factors is minimized, subject to the constraint that both the diagonal matrix of unique variances and the
Multivariate statistical analysis of wildfires in Portugal
Costa, Ricardo; Caramelo, Liliana; Pereira, Mário
2013-04-01
Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).
Efficient nonrigid registration using ranked order statistics
DEFF Research Database (Denmark)
Tennakoon, Ruwan B.; Bab-Hadiashar, Alireza; de Bruijne, Marleen
2013-01-01
of research. In this paper we propose a fast and accurate non-rigid registration method for intra-modality volumetric images. Our approach exploits the information provided by an order statistics based segmentation method, to find the important regions for registration and use an appropriate sampling scheme...
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Multivariate statistics high-dimensional and large-sample approximations
Fujikoshi, Yasunori; Shimizu, Ryoichi
2010-01-01
A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic
Gini s ideas: new perspectives for modern multivariate statistical analysis
Directory of Open Access Journals (Sweden)
Angela Montanari
2013-05-01
Full Text Available Corrado Gini (1884-1964 may be considered the greatest Italian statistician. We believe that his important contributions to statistics, however mainly limited to the univariate context, may be profitably employed in modern multivariate statistical methods, aimed at overcoming the curse of dimensionality by decomposing multivariate problems into a series of suitably posed univariate ones.In this paper we critically summarize Gini’s proposals and consider their impact on multivariate statistical methods, both reviewing already well established applications and suggesting new perspectives.Particular attention will be devoted to classification and regression trees, multiple linear regression, linear dimension reduction methods and transvariation based discrimination.
Statistical Inference for a Class of Multivariate Negative Binomial Distributions
DEFF Research Database (Denmark)
Rubak, Ege H.; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...
Multivariate Relationships between Statistics Anxiety and Motivational Beliefs
Baloglu, Mustafa; Abbassi, Amir; Kesici, Sahin
2017-01-01
In general, anxiety has been found to be associated with motivational beliefs and the current study investigated multivariate relationships between statistics anxiety and motivational beliefs among 305 college students (60.0% women). The Statistical Anxiety Rating Scale, the Motivated Strategies for Learning Questionnaire, and a set of demographic…
Inverted rank distributions: Macroscopic statistics, universality classes, and critical exponents
Eliazar, Iddo; Cohen, Morrel H.
2014-01-01
An inverted rank distribution is an infinite sequence of positive sizes ordered in a monotone increasing fashion. Interlacing together Lorenzian and oligarchic asymptotic analyses, we establish a macroscopic classification of inverted rank distributions into five “socioeconomic” universality classes: communism, socialism, criticality, feudalism, and absolute monarchy. We further establish that: (i) communism and socialism are analogous to a “disordered phase”, feudalism and absolute monarchy are analogous to an “ordered phase”, and criticality is the “phase transition” between order and disorder; (ii) the universality classes are characterized by two critical exponents, one governing the ordered phase, and the other governing the disordered phase; (iii) communism, criticality, and absolute monarchy are characterized by sharp exponent values, and are inherently deterministic; (iv) socialism is characterized by a continuous exponent range, is inherently stochastic, and is universally governed by continuous power-law statistics; (v) feudalism is characterized by a continuous exponent range, is inherently stochastic, and is universally governed by discrete exponential statistics. The results presented in this paper yield a universal macroscopic socioeconophysical perspective of inverted rank distributions.
Using Multivariate Statistical Analysis for Grouping of State Forest Enterprises
Directory of Open Access Journals (Sweden)
Atakan Öztürk
2010-11-01
Full Text Available The purpose of this study was to investigate the use possibilities of multivariate statistical analysis methods for grouping of Forest Enterprises. This study involved 24 Forest Enterprises in Eastern Black Sea Region. A total 69 variables, classified as physical, economic, social, rural settlements, technical-managerial, and functional variables, were developed. Multivariate statistics such as factor, cluster and discriminate analyses were used to classify 24 Forest Enterpprises. These enterprises classified into 2 groups. 22 enterprises were in first group and while remained 2 enterprises in second group.
Multivariate methods and forecasting with IBM SPSS statistics
Aljandali, Abdulkader
2017-01-01
This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...
Statistical inference for a class of multivariate negative binomial distributions
DEFF Research Database (Denmark)
Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...
Multivariate statistical characterization of groundwater quality in Ain ...
African Journals Online (AJOL)
Multivariate statistical techniques, cluster and principal component analysis were applied to the data on groundwater quality of Ain Azel plain (Algeria), to extract principal factors corresponding to the different sources of variation in the hydrochemistry, with the objective of defining the main controls on the hydrochemistry at ...
Using multivariate statistical analysis to assess changes in water ...
African Journals Online (AJOL)
Abstract. Multivariate statistical analysis was used to investigate changes in water chemistry at 5 river sites in the Vaal Dam catch- ... analysis (CCA) showed that the environmental variables used in the analysis, discharge and month of sampling, explained ...... DINGENEN R, WILD O and ZENG G (2006) The global atmos-.
Multivariate statistical characterization of groundwater quality in Ain ...
African Journals Online (AJOL)
Administrator
4Laboratory of Organic Materials, University of Bejaia, Targa- Ouzemour 06000, Algeria. Accepted 25 May, 2010. Multivariate statistical techniques, cluster and principal component analysis were applied to the data on groundwater quality of Ain Azel plain (Algeria), to extract principal factors corresponding to the different ...
Using multivariate statistical analysis to assess changes in water ...
African Journals Online (AJOL)
Multivariate statistical analysis was used to investigate changes in water chemistry at 5 river sites in the Vaal Dam catchment, draining the Highveld grasslands. These grasslands receive more than 8 kg sulphur (S) ha-1·year-1 and 6 kg nitrogen (N) ha-1·year-1 via atmospheric deposition. It was hypothesised that between ...
Practical applications of multivariate statistics in exploration geochemistry
Vriend, S.P.
1990-01-01
The search for new economic ore-deposits becomes increasingly difficult. A sophisticated approach is required to locate new ones. In exploration geochemistry the use of uni- and multivariate statistics is often advocated. In this series of studies it is shown how techniques such as factor
Enhanced bio-manufacturing through advanced multivariate statistical technologies.
Martin, E B; Morris, A J
2002-11-13
The paper describes the interrogation of data, from a reaction vessel producing an active pharmaceutical ingredient (API), using advanced multivariate statistical techniques. Due to the limited number of batches available, data augmentation was used to increase the number of batches thereby enabling the extraction of more subtle process behaviour from the data. A second methodology investigated was that of multi-group modelling. This allowed between cluster variability to be removed, thus allowing attention to focus on within process variability. The paper describes how the different approaches enabled the realisation of a better understanding of the factors causing the onset of an impurity formation to be obtained as well demonstrating the power of multivariate statistical data analysis techniques to provide an enhanced understanding of the process.
Multivariate Statistical Process Control Process Monitoring Methods and Applications
Ge, Zhiqiang
2013-01-01
Given their key position in the process control industry, process monitoring techniques have been extensively investigated by industrial practitioners and academic control researchers. Multivariate statistical process control (MSPC) is one of the most popular data-based methods for process monitoring and is widely used in various industrial areas. Effective routines for process monitoring can help operators run industrial processes efficiently at the same time as maintaining high product quality. Multivariate Statistical Process Control reviews the developments and improvements that have been made to MSPC over the last decade, and goes on to propose a series of new MSPC-based approaches for complex process monitoring. These new methods are demonstrated in several case studies from the chemical, biological, and semiconductor industrial areas. Control and process engineers, and academic researchers in the process monitoring, process control and fault detection and isolation (FDI) disciplines will be inter...
Statistical regularities in the rank-citation profile of scientists.
Petersen, Alexander M; Stanley, H Eugene; Succi, Sauro
2011-01-01
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile c(i)(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c(i)(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different c(i)(r) profiles, our results demonstrate the utility of the β(i) scaling parameter in conjunction with h(i) for quantifying individual publication impact. We show that the total number of citations C(i) tallied from a scientist's N(i) papers scales as [Formula: see text]. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.
Multivariate statistical analysis a high-dimensional approach
Serdobolskii, V
2000-01-01
In the last few decades the accumulation of large amounts of in formation in numerous applications. has stimtllated an increased in terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen ...
Assessing vascular endothelial function using frequency and rank order statistics
Wu, Hsien-Tsai; Hsu, Po-Chun; Sun, Cheuk-Kwan; Liu, An-Bang; Lin, Zong-Lin; Tang, Chieh-Ju; Lo, Men-Tzung
2013-08-01
Using frequency and rank order statistics (FROS), this study analyzed the fluctuations in arterial waveform amplitudes recorded from an air pressure sensing system before and after reactive hyperemia (RH) induction by temporary blood flow occlusion to evaluate the vascular endothelial function of aged and diabetic subjects. The modified probability-weighted distance (PWD) calculated from the FROS was compared with the dilatation index (DI) to evaluate its validity and sensitivity in the assessment of vascular endothelial function. The results showed that the PWD can provide a quantitative determination of the structural changes in the arterial pressure signals associated with regulation of vascular tone and blood pressure by intact vascular endothelium after the application of occlusion stress. Our study suggests that the use of FROS is a reliable noninvasive approach to the assessment of vascular endothelial degeneration in aging and diabetes.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Statistical analysis of compressive low rank tomography with random measurements
Acharya, Anirudh; Guţă, Mădălin
2017-05-01
We consider the statistical problem of ‘compressive’ estimation of low rank states (r\\ll d ) with random basis measurements, where r, d are the rank and dimension of the state respectively. We investigate whether for a fixed sample size N, the estimation error associated with a ‘compressive’ measurement setup is ‘close’ to that of the setting where a large number of bases are measured. We generalise and extend previous results, and show that the mean square error (MSE) associated with the Frobenius norm attains the optimal rate rd/N with only O(r log{d}) random basis measurements for all states. An important tool in the analysis is the concentration of the Fisher information matrix (FIM). We demonstrate that although a concentration of the MSE follows from a concentration of the FIM for most states, the FIM fails to concentrate for states with eigenvalues close to zero. We analyse this phenomenon in the case of a single qubit and demonstrate a concentration of the MSE about its optimal despite a lack of concentration of the FIM for states close to the boundary of the Bloch sphere. We also consider the estimation error in terms of a different metric-the quantum infidelity. We show that a concentration in the mean infidelity (MINF) does not exist uniformly over all states, highlighting the importance of loss function choice. Specifically, we show that for states that are nearly pure, the MINF scales as 1/\\sqrt{N} but the constant converges to zero as the number of settings is increased. This demonstrates a lack of ‘compressive’ recovery for nearly pure states in this metric.
Multivariate Statistical Modelling of Drought and Heat Wave Events
Manning, Colin; Widmann, Martin; Vrac, Mathieu; Maraun, Douglas; Bevaqua, Emanuele
2016-04-01
Multivariate Statistical Modelling of Drought and Heat Wave Events C. Manning1,2, M. Widmann1, M. Vrac2, D. Maraun3, E. Bevaqua2,3 1. School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK 2. Laboratoire des Sciences du Climat et de l'Environnement, (LSCE-IPSL), Centre d'Etudes de Saclay, Gif-sur-Yvette, France 3. Wegener Center for Climate and Global Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria Compound extreme events are a combination of two or more contributing events which in themselves may not be extreme but through their joint occurrence produce an extreme impact. Compound events are noted in the latest IPCC report as an important type of extreme event that have been given little attention so far. As part of the CE:LLO project (Compound Events: muLtivariate statisticaL mOdelling) we are developing a multivariate statistical model to gain an understanding of the dependence structure of certain compound events. One focus of this project is on the interaction between drought and heat wave events. Soil moisture has both a local and non-local effect on the occurrence of heat waves where it strongly controls the latent heat flux affecting the transfer of sensible heat to the atmosphere. These processes can create a feedback whereby a heat wave maybe amplified or suppressed by the soil moisture preconditioning, and vice versa, the heat wave may in turn have an effect on soil conditions. An aim of this project is to capture this dependence in order to correctly describe the joint probabilities of these conditions and the resulting probability of their compound impact. We will show an application of Pair Copula Constructions (PCCs) to study the aforementioned compound event. PCCs allow in theory for the formulation of multivariate dependence structures in any dimension where the PCC is a decomposition of a multivariate distribution into a product of bivariate components modelled using copulas. A
Classification of Specialized Farms Applying Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
Zuzana Hloušková
2017-01-01
Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.
Multivariate statistical analysis of atom probe tomography data.
Parish, Chad M; Miller, Michael K
2010-10-01
The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed. Copyright 2010 Elsevier B.V. All rights reserved.
Statistical regularities in the rank-citation profile of scientists
Petersen, Alexander M; Succi, Sauro
2011-01-01
Recent "science of science" research shows common regularities in the publication patterns of scientific papers across time and discipline. Here we analyze the complete publication careers of 300 scientists and find remarkable regularity in the functional form of the rank-citation profile c_{i}(r) for each scientist i =1...300. We find that the rank-ordered citation distribution c_{i}(r) can be approximated by a discrete generalized beta distribution (DGBD) over the entire range of ranks r, which allows for the characterization and comparison of c_{i}(r) using a common framework. The functional form of the DGBD has two scaling exponents, beta_i and gamma_i, which determine the scaling behavior of c_{i}(r) for both small and large rank r. The crossover between two scaling regimes suggests a complex reinforcement or positive-feedback relation between the impact of a scientist's most famous papers and the impact of his/her other papers. Moreover, since two scientists with equivalent Hirsch h-index values may hav...
Multivariate statistical modelling based on generalized linear models
Fahrmeir, Ludwig
1994-01-01
This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...
Statistical regularities in the rank-citation profile of scientists
Petersen, Alexander M.; Stanley, H. Eugene; Succi, Sauro
2011-01-01
Recent "science of science" research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate scientific production and impact of individual careers using the rank-citation profile c_{i}(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper r...
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-05-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
The Inappropriate Symmetries of Multivariate Statistical Analysis in Geometric Morphometrics.
Bookstein, Fred L
In today's geometric morphometrics the commonest multivariate statistical procedures, such as principal component analysis or regressions of Procrustes shape coordinates on Centroid Size, embody a tacit roster of symmetries-axioms concerning the homogeneity of the multiple spatial domains or descriptor vectors involved-that do not correspond to actual biological fact. These techniques are hence inappropriate for any application regarding which we have a-priori biological knowledge to the contrary (e.g., genetic/morphogenetic processes common to multiple landmarks, the range of normal in anatomy atlases, the consequences of growth or function for form). But nearly every morphometric investigation is motivated by prior insights of this sort. We therefore need new tools that explicitly incorporate these elements of knowledge, should they be quantitative, to break the symmetries of the classic morphometric approaches. Some of these are already available in our literature but deserve to be known more widely: deflated (spatially adaptive) reference distributions of Procrustes coordinates, Sewall Wright's century-old variant of factor analysis, the geometric algebra of importing explicit biomechanical formulas into Procrustes space. Other methods, not yet fully formulated, might involve parameterized models for strain in idealized forms under load, principled approaches to the separation of functional from Brownian aspects of shape variation over time, and, in general, a better understanding of how the formalism of landmarks interacts with the many other approaches to quantification of anatomy. To more powerfully organize inferences from the high-dimensional measurements that characterize so much of today's organismal biology, tomorrow's toolkit must rely neither on principal component analysis nor on the Procrustes distance formula, but instead on sound prior biological knowledge as expressed in formulas whose coefficients are not all the same. I describe the problems of
Classification of Malaysia aromatic rice using multivariate statistical analysis
Energy Technology Data Exchange (ETDEWEB)
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A. [School of Mechatronic Engineering, Universiti Malaysia Perlis, Kampus Pauh Putra, 02600 Arau, Perlis (Malaysia); Omar, O. [Malaysian Agriculture Research and Development Institute (MARDI), Persiaran MARDI-UPM, 43400 Serdang, Selangor (Malaysia)
2015-05-15
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Sensitivity analysis of ranked data: from order statistics to quantiles
Heidergott, B.F.; Volk-Makarewicz, W.
2015-01-01
In this paper we provide the mathematical theory for sensitivity analysis of order statistics of continuous random variables, where the sensitivity is with respect to a distributional parameter. Sensitivity analysis of order statistics over a finite number of observations is discussed before
Multivariate statistical analysis of a multi-step industrial processes
DEFF Research Database (Denmark)
Reinikainen, S.P.; Høskuldsson, Agnar
2007-01-01
multivariate multi-step processes, where results from each step are used to evaluate future results, is presented. The methods presented are based on Priority PLS Regression. The basic idea is to compute the weights in the regression analysis for given steps, but adjust all data by the resulting score vectors...
Two Phase Analysis of Ski Schools Customer Satisfaction: Multivariate Ranking and Cub Models
Directory of Open Access Journals (Sweden)
Rosa Arboretti
2014-06-01
Full Text Available Monitoring tourists' opinions is an important issue also for companies providing sport services. The aim of this paper was to apply CUB models and nonparametric permutation methods to a large customer satisfaction survey performed in 2011 in the ski schools of Alto Adige (Italy. The two-phase data processing was mainly aimed to: establish a global ranking of a sample of five ski schools, on the basis of satisfaction scores for several specific service aspects; to estimate specific components of the respondents’ evaluation process (feeling and uncertainty and to detect if customers’ characteristics affected these two components. With the application of NPC-Global ranking we obtained a ranking of the evaluated ski schools simultaneously considering satisfaction scores of several service’s aspects. CUB models showed which aspects and subgroups were less satisfied giving tips on how to improve services and customer satisfaction.
The exact probability distribution of the rank product statistics for replicated experiments.
Eisinga, Rob; Breitling, Rainer; Heskes, Tom
2013-03-18
The rank product method is a widely accepted technique for detecting differentially regulated genes in replicated microarray experiments. To approximate the sampling distribution of the rank product statistic, the original publication proposed a permutation approach, whereas recently an alternative approximation based on the continuous gamma distribution was suggested. However, both approximations are imperfect for estimating small tail probabilities. In this paper we relate the rank product statistic to number theory and provide a derivation of its exact probability distribution and the true tail probabilities. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Nonrigid registration of volumetric images using ranked order statistics
DEFF Research Database (Denmark)
Tennakoon, Ruwan; Bab-Hadiashar, Alireza; Cao, Zhenwei
2014-01-01
burden and increase the registration accuracy has become an intensive area of research. In this paper we propose a fast and accurate non-rigid registration method for intra-modality volumetric images. Our approach exploits the information provided by an order statistics based segmentation method, to find......Non-rigid image registration techniques using intensity based similarity measures are widely used in medical imaging applications. Due to high computational complexities of these techniques, particularly for volumetric images, finding appropriate registration methods to both reduce the computation...... the important regions for registration and use an appropriate sampling scheme to target those areas and reduce the registration computation time. A unique advantage of the proposed method is its ability to identify the point of diminishing returns and stop the registration process. Our experiments...
Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks
Frahm, Klaus M.; Shepelyansky, Dima L.
2014-04-01
We use the methods of quantum chaos and Random Matrix Theory for analysis of statistical fluctuations of PageRank probabilities in directed networks. In this approach the effective energy levels are given by a logarithm of PageRank probability at a given node. After the standard energy level unfolding procedure we establish that the nearest spacing distribution of PageRank probabilities is described by the Poisson law typical for integrable quantum systems. Our studies are done for the Twitter network and three networks of Wikipedia editions in English, French and German. We argue that due to absence of level repulsion the PageRank order of nearby nodes can be easily interchanged. The obtained Poisson law implies that the nearby PageRank probabilities fluctuate as random independent variables.
2017-09-01
from this PDF . EMOS models also use multiple linear regression to characterize the sensitivity of a univariate weather quantity—that is, the...classical least-squares approach to multivariate multiple linear regression using both measures-oriented and distributions-oriented scoring rules...14. SUBJECT TERMS ensemble model output statistics, statistical post-processing, multivariate multiple linear regression, Bayesian data analysis
Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples
McNeish, Daniel
2017-01-01
In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…
Refining developmental coordination disorder subtyping with multivariate statistical methods
Directory of Open Access Journals (Sweden)
Lalanne Christophe
2012-07-01
Full Text Available Abstract Background With a large number of potentially relevant clinical indicators penalization and ensemble learning methods are thought to provide better predictive performance than usual linear predictors. However, little is known about how they perform in clinical studies where few cases are available. We used Random Forests and Partial Least Squares Discriminant Analysis to select the most salient impairments in Developmental Coordination Disorder (DCD and assess patients similarity. Methods We considered a wide-range testing battery for various neuropsychological and visuo-motor impairments which aimed at characterizing subtypes of DCD in a sample of 63 children. Classifiers were optimized on a training sample, and they were used subsequently to rank the 49 items according to a permuted measure of variable importance. In addition, subtyping consistency was assessed with cluster analysis on the training sample. Clustering fitness and predictive accuracy were evaluated on the validation sample. Results Both classifiers yielded a relevant subset of items impairments that altogether accounted for a sharp discrimination between three DCD subtypes: ideomotor, visual-spatial and constructional, and mixt dyspraxia. The main impairments that were found to characterize the three subtypes were: digital perception, imitations of gestures, digital praxia, lego blocks, visual spatial structuration, visual motor integration, coordination between upper and lower limbs. Classification accuracy was above 90% for all classifiers, and clustering fitness was found to be satisfactory. Conclusions Random Forests and Partial Least Squares Discriminant Analysis are useful tools to extract salient features from a large pool of correlated binary predictors, but also provide a way to assess individuals proximities in a reduced factor space. Less than 15 neuro-visual, neuro-psychomotor and neuro-psychological tests might be required to provide a sensitive and
[Place of multivariate statistics in clinical physiological research].
Volynskiĭ, Iu D; Kurochkina, A I
1980-05-01
On the basis of their own experience of many years in the use of mathematical methods in medicine the authors provide a critical analysis of the results of using computers in making the diagnosis and the main reasons for dissatisfaction with the results. Regarding clinico-physiological studies as a field for the application of modern mathematical statistics, they suggest a logical scheme which they had tested time and again for statistical analysis of the data by multidimensional methods (particularly with the use of the method of principal components, cluster analysis, latent analysis, lambda-moments, etc.) and also some methodological devices which permit keeping the data at hand throughout the entire analysis, resorting time and again to all possibilities for their complete and exhaustive description. The position of the authors in principle consists in the fact that success can only be achieved by constant joint work of the medical specialist and mathematician, beginning with the first stage of formulating both the medical and statistical tasks on condition that the clinico-physiological essence of the problem is comprehended by the mathematician.
Multivariate Statistical Analysis of the Tularosa-Hueco Basin
Agrawala, G.; Walton, J. C.
2006-12-01
The border region is growing rapidly and experiencing a sharp decline both in water quality and availability putting a strain on the quickly diminishing resource. Since water is used primarily for agricultural, domestic, commercial, livestock, mining and power generation, its rapid depletion is of major concern in the region. Tools such as Principal Component Analysis (PCA), Correspondence Analysis and Cluster Analysis have the potential to present new insight into this problem. The Tularosa-Hueco Basin is analyzed here using some of these Multivariate Analysis methods. PCA is applied to geo-chemical data from the region and a Cluster Analysis is applied to the results in order to group wells with similar characteristics. The derived Principal Axis and well groups are presented as biplots and overlaid on a digital elevation map of the region providing a visualization of potential interactions and flow path between surface water and ground water. Simulation by this modeling technique give a valuable insight to the water chemistry and the potential pollution threats to the already water diminishing resources.
Atzori, A S; Tedeschi, L O; Cannas, A
2013-05-01
The economic efficiency of dairy farms is the main goal of farmers. The objective of this work was to use routinely available information at the dairy farm level to develop an index of profitability to rank dairy farms and to assist the decision-making process of farmers to increase the economic efficiency of the entire system. A stochastic modeling approach was used to study the relationships between inputs and profitability (i.e., income over feed cost; IOFC) of dairy cattle farms. The IOFC was calculated as: milk revenue + value of male calves + culling revenue - herd feed costs. Two databases were created. The first one was a development database, which was created from technical and economic variables collected in 135 dairy farms. The second one was a synthetic database (sDB) created from 5,000 synthetic dairy farms using the Monte Carlo technique and based on the characteristics of the development database data. The sDB was used to develop a ranking index as follows: (1) principal component analysis (PCA), excluding IOFC, was used to identify principal components (sPC); and (2) coefficient estimates of a multiple regression of the IOFC on the sPC were obtained. Then, the eigenvectors of the sPC were used to compute the principal component values for the original 135 dairy farms that were used with the multiple regression coefficient estimates to predict IOFC (dRI; ranking index from development database). The dRI was used to rank the original 135 dairy farms. The PCA explained 77.6% of the sDB variability and 4 sPC were selected. The sPC were associated with herd profile, milk quality and payment, poor management, and reproduction based on the significant variables of the sPC. The mean IOFC in the sDB was 0.1377 ± 0.0162 euros per liter of milk (€/L). The dRI explained 81% of the variability of the IOFC calculated for the 135 original farms. When the number of farms below and above 1 standard deviation (SD) of the dRI were calculated, we found that 21
National Research Council Canada - National Science Library
Bakraji, Elias Hanna; Abboud, Rana; Issa, Haissm
2014-01-01
Thermoluminescence (TL) dating and multivariate statistical methods based on radioisotope X-ray fluorescence analysis have been utilized to date and classify Syrian archaeological ceramics fragment from Tel Jamous site...
National Research Council Canada - National Science Library
Chunmei Guan; Rui Dang; Yu Cui; Liyan Liu; Xiaobei Chen; Xiaoyu Wang; Jingli Zhu; Donggang Li; Junwei Li; Decai Wang
.... We have used an analytical approach, based on inductively coupled plasma mass spectrometry coupled with multivariate statistical analysis, to study the profiles of a wide range of metals in AD...
National Research Council Canada - National Science Library
Chunmei Guan; Rui Dang; Yu Cui; Liyan Liu; Xiaobei Chen; Xiaoyu Wang; Jingli Zhu; Donggang Li; Junwei Li; Decai Wang
2017-01-01
.... We have used an analytical approach, based on inductively coupled plasma mass spectrometry coupled with multivariate statistical analysis, to study the profiles of a wide range of metals in AD...
National Research Council Canada - National Science Library
Carlos Mario Zuluaga Dominguez
2011-01-01
The use of multivariate statistical techniques for quality and process control in the food industry has been growing significantly since the mid-seventies, as a result of the informatics revolution...
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.
2012-01-01
PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator
Kruger, Uwe
2012-01-01
The development and application of multivariate statistical techniques in process monitoring has gained substantial interest over the past two decades in academia and industry alike. Initially developed for monitoring and fault diagnosis in complex systems, such techniques have been refined and applied in various engineering areas, for example mechanical and manufacturing, chemical, electrical and electronic, and power engineering. The recipe for the tremendous interest in multivariate statistical techniques lies in its simplicity and adaptability for developing monitoring applica
Liu, Na; Li, Jun; Li, Bao-Guo
2014-11-01
The study of quality control of Chinese medicine has always been the hot and the difficulty spot of the development of traditional Chinese medicine (TCM), which is also one of the key problems restricting the modernization and internationalization of Chinese medicine. Multivariate statistical analysis is an analytical method which is suitable for the analysis of characteristics of TCM. It has been used widely in the study of quality control of TCM. Multivariate Statistical analysis was used for multivariate indicators and variables that appeared in the study of quality control and had certain correlation between each other, to find out the hidden law or the relationship between the data can be found,.which could apply to serve the decision-making and realize the effective quality evaluation of TCM. In this paper, the application of multivariate statistical analysis in the quality control of Chinese medicine was summarized, which could provided the basis for its further study.
Energy Technology Data Exchange (ETDEWEB)
Kalivas, John H., E-mail: kalijohn@isu.edu [Department of Chemistry, Idaho State University, Pocatello, ID 83209 (United States); Héberger, Károly [Research Centre for Natural Sciences, Hungarian Academy of Sciences, Pusztaszeri út 59-67, 1025 Budapest (Hungary); Andries, Erik [Center for Advanced Research Computing, University of New Mexico, Albuquerque, NM 87106 (United States); Department of Mathematics, Central New Mexico Community College, Albuquerque, NM 87106 (United States)
2015-04-15
Highlights: • Sum of ranking differences (SRD) used for tuning parameter selection based on fusion of multicriteria. • No weighting scheme is needed for the multicriteria. • SRD allows automatic selection of one model or a collection of models if so desired. • SRD allows simultaneous comparison of different calibration methods with tuning parameter selection. • New MATLAB programs are described and made available. - Abstract: Most multivariate calibration methods require selection of tuning parameters, such as partial least squares (PLS) or the Tikhonov regularization variant ridge regression (RR). Tuning parameter values determine the direction and magnitude of respective model vectors thereby setting the resultant predication abilities of the model vectors. Simultaneously, tuning parameter values establish the corresponding bias/variance and the underlying selectivity/sensitivity tradeoffs. Selection of the final tuning parameter is often accomplished through some form of cross-validation and the resultant root mean square error of cross-validation (RMSECV) values are evaluated. However, selection of a “good” tuning parameter with this one model evaluation merit is almost impossible. Including additional model merits assists tuning parameter selection to provide better balanced models as well as allowing for a reasonable comparison between calibration methods. Using multiple merits requires decisions to be made on how to combine and weight the merits into an information criterion. An abundance of options are possible. Presented in this paper is the sum of ranking differences (SRD) to ensemble a collection of model evaluation merits varying across tuning parameters. It is shown that the SRD consensus ranking of model tuning parameters allows automatic selection of the final model, or a collection of models if so desired. Essentially, the user’s preference for the degree of balance between bias and variance ultimately decides the merits used in SRD
Weighted log-rank statistic to compare shared-path adaptive treatment strategies.
Kidwell, Kelley M; Wahed, Abdus S
2013-04-01
Adaptive treatment strategies (ATSs) more closely mimic the reality of a physician's prescription process where the physician prescribes a medication to his/her patient, and based on that patient's response to the medication, modifies the treatment. Two-stage randomization designs, more generally, sequential multiple assignment randomization trial designs, are useful to assess ATSs where the interest is in comparing the entire sequence of treatments, including the patient's intermediate response. In this paper, we introduce the notion of shared-path and separate-path ATSs and propose a weighted log-rank statistic to compare overall survival distributions of multiple two-stage ATSs, some of which may be shared-path. Large sample properties of the statistic are derived and the type I error rate and power of the test are compared with the standard log-rank test through simulation.
Directory of Open Access Journals (Sweden)
Carlos Mario Zuluaga Dominguez
2011-04-01
Full Text Available The use of multivariate statistical techniques for quality and process control in the food industry has been growing significantly since the mid-seventies, as a result of the informatics revolution which facilitated the analysis of large data sets. Unlike univariate methods of data exploration, multivariate statistics uses as a major pillar the analysis of information described by three or more variables that can be simultaneously studied and understood in a fast, efficient and easy way. Thanks to the extraordinary advance in computing machines, it is now possible to apply these methodologies to solve extremely complex problems. This article presents the most recognized multivariate statistical techniques, as well as the compilation of some papers that serve as a demonstration of its applicability in the field of foods.
Energy Technology Data Exchange (ETDEWEB)
Schroeder, R.; Hagemann, H.W.; Wolf, M. [RWTH Aachen (Germany). Lehrstuhl fuer Geologie, Geochemie und Lagerstaetten; Wolff-Fischer, E.
1996-12-01
Multivariant statistical analyses were used to examine the bonding of coal-relevant trace elements As, Be, Cd, Co, Cr, Cu, Hg, Mn, Mo, Ni, Pb, U, V and Zn with the minerals in 31 types of coal in the Ruhr coalfield. Samples chosen came from Westfal A to Westfal C. The trace element analyses and radioscopic phase analyses to establish the mineral content were carried out on both low and high-ash outputs from a laboratory flotation. Using factor analysis the geochemical character of the trace elements was described and `clay-mineral`, `sulphide` and, `heavy crop mineral` factors deduced. Cluster analysis in the Q-mode yields a separation according to the stratigraphy, while the R-mode provides an agglomeration according to the geochemical character of the trace elements. (orig.)
Yalcin, M Gurhan; Tumuklu, Ali; Sonmez, Mustafa; Erdag, Dilek Satir
2010-05-01
In this study, freshly deposited soils were sampled from the Seyhan River (Turkey) from the exit of the Seyhan Dam to the Adana exit. Heavy metal contents were measured with X-ray fluorescence spectrometer. Multivariate statistical approach is used to identify the sources of heavy metals and other elements in soil samples. Considering the size of anomalies, metals are ranked as Co>Pb>Cr>Zn>Al. Based on the hierarchical cluster analysis results, three clusters were observed. P, Mg, Ti, Fe, Ca, Na, K, Al, Si, and Nb form the first cluster, Zn, Sr, Pb, and Cr associated as the second cluster, and Ba and Co form the third cluster. Three factors computed from principal component analysis are explained with a cumulative variance of 95%. The first factor is defined with "high background lithogenic factor" Co, the second factor with "local industrial factor" Pb, Cr, Ba, and Mg, and the third factor with "natural factor" Cr and Pb.
Multivariate statistical methods and data mining in particle physics (2/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (1/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (4/4)
CERN. Geneva
2008-01-01
The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi
2017-01-01
High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Monitoring a PVC batch process with multivariate statistical process control charts
Tates, A. A.; Louwerse, D. J.; Smilde, A. K.; Koot, G. L. M.; Berndt, H.
1999-01-01
Multivariate statistical process control charts (MSPC charts) are developed for the industrial batch production process of poly(vinyl chloride) (PVC). With these MSPC charts different types of abnormal batch behavior were detected on-line. With batch contribution plots, the probable causes of these
Weighted Multivariate Cram´er-von Mises-type Statistics | Deheuvels ...
African Journals Online (AJOL)
In this paper, we consider weighted quadratic functionals of the multivariate uniform empirical process. By deriving the Karhunen-Lo`eve expansion of the corresponding limiting Gaussian process, we obtain the asymptotic distribution of these statistics. Our results have direct applications to tests of goodness of fit and tests ...
Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas
2017-04-15
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Energy Technology Data Exchange (ETDEWEB)
Pinto, Licarion [Laboratório de Automação e Instrumentação em Química Analítica e Quimiometria (LAQA), Universidade Federal da Paraíba, CCEN, Departamento de Química, Caixa Postal 5093, CEP 58051-970, João Pessoa, PB (Brazil); Díaz Nieto, César Horacio; Zón, María Alicia; Fernández, Héctor [Departamento de Química, Facultad de Ciencias Exactas, Físico-Químicas y Naturales, Universidad Nacional de Río Cuarto, 5800, Río Cuarto (Argentina); Ugulino de Araujo, Mario Cesar, E-mail: laqa@quimica.ufpb.br [Laboratório de Automação e Instrumentação em Química Analítica e Quimiometria (LAQA), Universidade Federal da Paraíba, CCEN, Departamento de Química, Caixa Postal 5093, CEP 58051-970, João Pessoa, PB (Brazil)
2016-01-01
Biogenic amines (BAs) are used for identifying spoilage in food. The most common are tryptamine (TRY), 2-phenylethylamine (PHE), putrescine (PUT), cadaverine (CAD) and histamine (HIS). Due to lack of chromophores, chemical derivatization with dansyl was employed to analyze these BAs using high performance liquid chromatography with a diode array detector (HPLC-DAD). However, the derivatization reaction occurs with any primary or secondary amine, leading to co-elution of analytes and interferents with identical spectral profiles, and thus causing rank deficiency. When the spectral profile is the same and peak misalignment is present on the chromatographic runs, it is not possible to handle the data only with Multivariate Curve Resolution and Alternative Least Square (MCR-ALS), by augmenting the time, or the spectral mode. A way to circumvent this drawback is to receive information from another detector that leads to a selective profile for the analyte. To overcome both problems, (tri-linearity break in time, and spectral mode), this paper proposes a new analytical methodology for fast quantitation of these BAs in fish with HPLC-DAD by using the icoshift algorithm for temporal misalignment correction before MCR-ALS spectral mode augmented treatment. Limits of detection, relative errors of prediction (REP) and average recoveries, ranging from 0.14 to 0.50 µg mL{sup −1}, 3.5–8.8% and 88.08%–99.68%, respectively. These are outstanding results obtained, reaching quantification limits for the five BAs much lower than those established by the Food and Agriculture Organization of the United Nations and World Health Organization (FAO/WHO), and the European Food Safety Authority (EFSA), all without any pre-concentration steps. The concentrations of BAs in fish samples ranged from 7.82 to 29.41 µg g{sup −1}, 8.68–25.95 µg g{sup −1}, 4.76–28.54 µg g{sup −1}, 5.18–39.95 µg g{sup −1} and 1.45–52.62 µg g{sup −1} for TRY, PHE, PUT, CAD, and
Alvarez, Odalys Quevedo; Tagle, Margarita Edelia Villanueva; Pascual, Jorge L Gómez; Marín, Ma Teresa Larrea; Clemente, Ana Catalina Nuñez; Medina, Miriam Odette Cora; Palau, Raiza Rey; Alfonso, Mario Simeón Pomares
2014-10-01
Spatial and temporal variations of sediment quality in Matanzas Bay (Cuba) were studied by determining a total of 12 variables (Zn, Cu, Pb, As, Ni, Co, Al, Fe, Mn, V, CO₃²⁻, and total hydrocarbons (THC). Surface sediments were collected, annually, at eight stations during 2005-2008. Multivariate statistical techniques, such as principal component (PCA), cluster (CA), and lineal discriminant (LDA) analyses were applied for identification of the most significant variables influencing the environmental quality of sediments. Heavy metals (Zn, Cu, Pb, V, and As) and THC were the most significant species contributing to sediment quality variations during the sampling period. Concentrations of V and As were determined in sediments of this ecosystem for the first time. The variation of sediment environmental quality with the sampling period and the differentiation of samples in three groups along the bay were obtained. The usefulness of the multivariate statistical techniques employed for the environmental interpretation of a limited dataset was confirmed.
Application of Multivariable Statistical Techniques in Plant-wide WWTP Control Strategies Analysis
DEFF Research Database (Denmark)
Flores Alsina, Xavier; Comas, J.; Rodríguez-Roda, I.
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant...... analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii......) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation...
Feyissa, Daniel D.; Aher, Yogesh D.; Engidawork, Ephrem; H?ger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal co...
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Directory of Open Access Journals (Sweden)
A. Rahmani
2016-01-01
Full Text Available Granular computing is an emerging computing theory and paradigm that deals with the processing of information granules, which are defined as a number of information entities grouped together due to their similarity, physical adjacency, or indistinguishability. In most aspects of human reasoning, these granules have an uncertain formation, so the concept of granularity of fuzzy information could be of special interest for the applications where fuzzy sets must be converted to crisp sets to avoid uncertainty. This paper proposes a novel method of defuzzification based on the mean value of statistical Beta distribution and an algorithm for ranking fuzzy numbers based on the crisp number ranking system on R. The proposed method is quite easy to use, but the main reason for following this approach is the equality of left spread, right spread, and mode of Beta distribution with their corresponding values in fuzzy numbers within (0,1 interval, in addition to the fact that the resulting method can satisfy all reasonable properties of fuzzy quantity ordering defined by Wang et al. The algorithm is illustrated through several numerical examples and it is then compared with some of the other methods provided by literature.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Demanuele, Charmaine; Bähner, Florian; Plichta, Michael M; Kirsch, Peter; Tost, Heike; Meyer-Lindenberg, Andreas; Durstewitz, Daniel
2015-01-01
Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from functional magnetic resonance imaging (fMRI) blood oxygenation level dependent (BOLD) time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze (RAM) task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC), but not in the primary visual cortex (V1). Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in
A statistical approach for segregating cognitive task stages from multivariate fMRI BOLD time series
Directory of Open Access Journals (Sweden)
Charmaine eDemanuele
2015-10-01
Full Text Available Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from fMRI blood oxygenation level dependent (BOLD time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC, but not in the primary visual cortex (V1. Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel
[Statistical tests in medical research: traditional methods vs. multivariate NPC permutation tests].
Arboretti, Rosa; Bordignon, Paolo; Corain, Livio; Palermo, Giuseppe; Pesarin, Fortunato; Salmaso, Luigi
2015-01-01
Statistical tests in medical research: traditional methods vs. multivariate npc permutation tests.Within medical research, a useful statistical tool is based on hypotheses testing in terms of the so-called null, that is the treatment has no effect, and alternative hypotheses, that is the treatment has some effects. By controlling the risks of wrong decisions, empirical data are used in order to possibly reject the null hypotheses in favour of the alternative, so that demonstrating the efficacy of a treatment of interest. The multivariate permutation tests, based on the nonparametric combination - NPC method, provide an innovative, robust and effective hypotheses testing solution to many real problems that are commonly encountered in medical research when multiple end-points are observed. This paper discusses the various approaches to hypothesis testing and the main advantages of NPC tests, which consist in the fact that they require much less stringent assumptions than traditional statistical tests. Moreover, the related results may be extended to the reference population even in case of selection-bias, that is non-random sampling. In this work, we review and discuss some basic testing procedures along with the theoretical and practical relevance of NPC tests showing their effectiveness in medical research. Within the non-parametric methods, NPC tests represent the current "frontier" of statistical research, but already widely available in the practice of analysis of clinical data.
Friedman, David B
2012-01-01
All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.
An Improvement of the Hotelling T2 Statistic in Monitoring Multivariate Quality Characteristics
Directory of Open Access Journals (Sweden)
Ashkan Shabbak
2012-01-01
Full Text Available The Hotelling T2 statistic is the most popular statistic used in multivariate control charts to monitor multiple qualities. However, this statistic is easily affected by the existence of more than one outlier in the data set. To rectify this problem, robust control charts, which are based on the minimum volume ellipsoid and the minimum covariance determinant, have been proposed. Most researchers assess the performance of multivariate control charts based on the number of signals without paying much attention to whether those signals are really outliers. With due respect, we propose to evaluate control charts not only based on the number of detected outliers but also with respect to their correct positions. In this paper, an Upper Control Limit based on the median and the median absolute deviation is also proposed. The results of this study signify that the proposed Upper Control Limit improves the detection of correct outliers but that it suffers from a swamping effect when the positions of outliers are not taken into consideration. Finally, a robust control chart based on the diagnostic robust generalised potential procedure is introduced to remedy this drawback.
DEFF Research Database (Denmark)
Schneider, Jesper Wiborg
2012-01-01
In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Rankin...
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Santos, L N S; Cabral, P D S; Neves, G A R; Alves, F R; Teixeira, M B; Cunha, F N; Silva, N F
2017-03-16
The availability of common bean cultivars tolerant to Meloidogyne javanica is limited in Brazil. Thus, the present study aimed to evaluate the reactions of 33 common bean genotypes (23 landrace, 8 commercial, 1 susceptible standard and 1 resistant standard) to M. javanica, employing multivariate statistics to discriminate the reaction of the genotypes. The experiment was conducted in a greenhouse using a completely randomized design with seven replicates. The seeds were sown in 1-L pots containing autoclaved soil and sand in a 1:1 ratio (v:v). On day 19, after emergence of the seedlings, the plants were treated with inoculum containing 4000 eggs + second-stage juveniles (J2). At 60 days after inoculation, the seedlings were evaluated based on biometric and parasitism-related traits, such as number of galls, final nematode population per root system, reproduction factor, and percent reduction in the reproduction factor of the nematode (%RRF). The data were subjected to analysis of variance using the F-test. The Mahalanobis generalized distance was used to obtain the dissimilarity matrix, and the average linkage between groups was used for clustering. The use of multivariate statistics allowed groups to be separated according to the resistance levels of genotypes, as observed in the %RRF. The landrace genotypes FORT-09, FORT-17, FORT-31, FORT-32, FORT-34 and FORT-36 presented resistance to M. javanica; thus, these genotypes can be considered potential sources of resistance.
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a hallucinogenic herb that is internationally regulated. In this study, salvinorin A, the active compound in S. divinorum, was extracted from S. divinorum plant leaves using a 5-min extraction with dichloromethane. Four additional Salvia species (Salvia officinalis, Salvia guaranitica, Salvia splendens, and Salvia nemorosa) were extracted using this procedure, and all extracts were analyzed by gas chromatography-mass spectrometry. Differentiation of S. divinorum from other Salvia species was successful based on visual assessment of the resulting chromatograms. To provide a more objective comparison, the total ion chromatograms (TICs) were subjected to principal components analysis (PCA). Prior to PCA, the TICs were subjected to a series of data pretreatment procedures to minimize non-chemical sources of variance in the data set. Successful discrimination of S. divinorum from the other four Salvia species was possible based on visual assessment of the PCA scores plot. To provide a numerical assessment of the discrimination, a series of statistical procedures such as Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores. The statistical procedures were then compared to determine the advantages and disadvantages for forensic applications.
Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A
2008-12-01
It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
Directory of Open Access Journals (Sweden)
Nsikak U Benson
Full Text Available Trace metals (Cd, Cr, Cu, Ni and Pb concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria. The degree of contamination was assessed using the individual contamination factors (ICF and global contamination factor (GCF. Multivariate statistical approaches including principal component analysis (PCA, cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources.
Energy Technology Data Exchange (ETDEWEB)
Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)
2012-03-15
Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.
Directory of Open Access Journals (Sweden)
Voza Danijela
2015-12-01
Full Text Available The aim of this article is to evaluate the quality of the Danube River in its course through Serbia as well as to demonstrate the possibilities for using three statistical methods: Principal Component Analysis (PCA, Factor Analysis (FA and Cluster Analysis (CA in the surface water quality management. Given that the Danube is an important trans-boundary river, thorough water quality monitoring by sampling at different distances during shorter and longer periods of time is not only ecological, but also a political issue. Monitoring was carried out at monthly intervals from January to December 2011, at 17 sampling sites. The obtained data set was treated by multivariate techniques in order, firstly, to identify the similarities and differences between sampling periods and locations, secondly, to recognize variables that affect the temporal and spatial water quality changes and thirdly, to present the anthropogenic impact on water quality parameters.
Urtubia, A; Hernández, G; Roger, J M
2012-06-30
Three multivariate statistical techniques (Multiway Principal Component Analysis, Multiway Partial Least Squares, and Stepwise Linear Discriminant Analysis) and one artificial intelligence method (Artificial Neural Networks) were evaluated to detect and predict early abnormal behaviors of wine fermentations. The techniques were tested with data of thirty-two variables at different stages of fermentation from industrial wine fermentations of Cabernet Sauvignon. All the techniques studied considered a pre-treatment to obtain a homogeneous space and reduce the overfitting. The results were encouraging; it was possible to classify at 72h 100% of the fermentation correctly with three variables using Multiway Partial Least Squares and Artificial Neural Networks. Additional and complementary results were obtained with Stepwise Linear Discriminant Analysis, which found that ethanol, sugars and density measurements are able to discriminate abnormal behavior. Copyright © 2011 Elsevier B.V. All rights reserved.
Bersimis, Sotiris; Panaretos, John; Psarakis, Stelios
2009-01-01
Woodall and Montgomery [35] in a discussion paper, state that multivariate process control is one of the most rapidly developing sections of statistical process control. Nowadays, in industry, there are many situations in which the simultaneous monitoring or control, of two or more related quality - process characteristics is necessary. Process monitoring problems in which several related variables are of interest are collectively known as Multivariate Statistical Process Control (MSPC).This ...
Directory of Open Access Journals (Sweden)
Senila Marin
2012-10-01
Full Text Available Abstract Background The metals bioavailability in soils is commonly assessed by chemical extractions; however a generally accepted method is not yet established. In this study, the effectiveness of Diffusive Gradients in Thin-films (DGT technique and single extractions in the assessment of metals bioaccumulation in vegetables, and the influence of soil parameters on phytoavailability were evaluated using multivariate statistics. Soil and plants grown in vegetable gardens from mining-affected rural areas, NW Romania, were collected and analysed. Results Pseudo-total metal content of Cu, Zn and Cd in soil ranged between 17.3-146 mg kg-1, 141–833 mg kg-1 and 0.15-2.05 mg kg-1, respectively, showing enriched contents of these elements. High degrees of metals extractability in 1M HCl and even in 1M NH4Cl were observed. Despite the relatively high total metal concentrations in soil, those found in vegetables were comparable to values typically reported for agricultural crops, probably due to the low concentrations of metals in soil solution (Csoln and low effective concentrations (CE, assessed by DGT technique. Among the analysed vegetables, the highest metal concentrations were found in carrots roots. By applying multivariate statistics, it was found that CE, Csoln and extraction in 1M NH4Cl, were better predictors for metals bioavailability than the acid extractions applied in this study. Copper transfer to vegetables was strongly influenced by soil organic carbon (OC and cation exchange capacity (CEC, while pH had a higher influence on Cd transfer from soil to plants. Conclusions The results showed that DGT can be used for general evaluation of the risks associated to soil contamination with Cu, Zn and Cd in field conditions. Although quantitative information on metals transfer from soil to vegetables was not observed.
Directory of Open Access Journals (Sweden)
Patricia S. González
2011-12-01
Full Text Available The aim of this paper was the application of multivariate statistical techniques to evaluate spatial and temporal variations in the water quality of Potrero de los Funes River using physical, chemical and bacteriological parameters and select the most significant parameters of organic pollution in the river in order to implement in the future water quality monitoring. The river was monitored regularly at three sites: RP1, RP2 and RP3, over the period 2008–2009, for 16 parameters. The complex data matrix was treated with three multivariate statistical techniques: cluster analysis (CA, principal component analysis (PCA and discriminant analysis (DA. CA generated three groups of sites, cluster 1 (RP1, cluster 2 (RP2 and cluster 3 (RP3 according to relatively low, very high and moderate pollution regions, respectively. PCA identified two components, which were responsible for the data structure explaining 73% of the total variance of the data matrix. Temporal DA (Wet season and Dry season showed that turbidity, NO3- and COD were the discriminant variables. Spatial DA shows that there were significant differences between the three categorical classes, 1 (RP1, low pollution region, 2 (RP2, strongly polluted zone and 3 (RP3, moderate polluted site .The discriminating functions contained only eight parameters (EC, NO3-, turbidity, DO, BOD, COD, total coliform and fecal coliform to discriminate between sites. The application of these techniques has achieved meaningful classification of physical, chemical and bacteriological variables and of river water samples, based on seasonal and spatial criteria. This study is essential for the future design of fast and effective monitoring programs of river water quality. That would include only parameters that are indicative of organic pollution.
Energy Technology Data Exchange (ETDEWEB)
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mew, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2015-09-24
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.
Directory of Open Access Journals (Sweden)
Elias Hanna Bakraji
2014-01-01
Full Text Available Thermoluminescence (TL dating and multivariate statistical methods based on radioisotope X-ray fluorescence analysis have been utilized to date and classify Syrian archaeological ceramics fragment from Tel Jamous site. 54 samples were analyzed by radioisotope X-ray fluorescence; 51 of them come from Tel Jamous archaeological site in Sahel Akkar region, Syria, which fairly represent ceramics belonging to the Middle Bronze Age (2150 to 1600 B.C. and the remaining three samples come from Mar-Takla archaeological site fairly representative of the Byzantine ceramics. We have selected four fragments from Tel Jamous site to determinate their age using thermoluminescence (TL method; the results revealed that the date assigned by archaeologists was good. An annular 109Cd radioactive source was used to irradiate the samples in order to determine their chemical composition and the results were treated statistically using two methods, cluster and factor analysis. This treatment revealed two main groups; the first one contains only the three samples M52, M53, and M54 from Mar-Takla site, and the second one contains samples that belong to Tel Jamous site (local.
Directory of Open Access Journals (Sweden)
Md. Bodrud-Doza
2016-04-01
Full Text Available This study investigates the groundwater quality in the Faridpur district of central Bangladesh based on preselected 60 sample points. Water evaluation indices and a number of statistical approaches such as multivariate statistics and geostatistics are applied to characterize water quality, which is a major factor for controlling the groundwater quality in term of drinking purposes. The study reveal that EC, TDS, Ca2+, total As and Fe values of groundwater samples exceeded Bangladesh and international standards. Ground water quality index (GWQI exhibited that about 47% of the samples were belonging to good quality water for drinking purposes. The heavy metal pollution index (HPI, degree of contamination (Cd, heavy metal evaluation index (HEI reveal that most of the samples belong to low level of pollution. However, Cd provide better alternative than other indices. Principle component analysis (PCA suggests that groundwater quality is mainly related to geogenic (rock–water interaction and anthropogenic source (agrogenic and domestic sewage in the study area. Subsequently, the findings of cluster analysis (CA and correlation matrix (CM are also consistent with the PCA results. The spatial distributions of groundwater quality parameters are determined by geostatistical modeling. The exponential semivariagram model is validated as the best fitted models for most of the indices values. It is expected that outcomes of the study will provide insights for decision makers taking proper measures for groundwater quality management in central Bangladesh.
One approach in using multivariate statistical process control in analyzing cheese quality
Directory of Open Access Journals (Sweden)
Ilija Djekic
2015-05-01
Full Text Available The objective of this paper was to investigate possibility of using multivariate statistical process control in analysing cheese quality parameters. Two cheese types (white brined cheeses and soft cheese from ultra-filtered milk were selected and analysed for several quality parameters such as dry matter, milk fat, protein contents, pH, NaCl, fat in dry matter and moisture in non-fat solids. The obtained results showed significant variations for most of the quality characteristics which were examined among the two types of cheese. The only stable parameter in both types of cheese was moisture in non-fat solids. All of the other cheese quality characteristics were characterized above or below control limits for most of the samples. Such results indicated a high instability and variations within cheese production. Although the use of statistical process control is not mandatory in the dairy industry, it might provide benefits to organizations in improving quality control of dairy products.
Mayer, Brian P; DeHope, Alan J; Mew, Daniel A; Spackman, Paul E; Williams, Audrey M
2016-04-19
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product, all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. A total of 160 distinct compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (gas chromatography/mass spectrometry (GC/MS) and liquid chromatography-tandem mass spectrometry-time of-flight (LC-MS/MS-TOF)) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least-squares-discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.
QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics.
Wang, Xia; Chambers, Matthew C; Vega-Montoto, Lorenzo J; Bunk, David M; Stein, Stephen E; Tabb, David L
2014-03-04
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, we have created a robust multivariate statistical toolkit that accommodates the correlation structure of these metrics and allows for hierarchical relationships among data sets. The framework enables visualization and structural assessment of variability. Study 1 for the Clinical Proteomics Technology Assessment for Cancer (CPTAC), which analyzed three replicates of two common samples at each of two time points among 23 mass spectrometers in nine laboratories, provided the data to demonstrate this framework, and CPTAC Study 5 provided data from complex lysates under Standard Operating Procedures (SOPs) to complement these findings. Identification-independent quality metrics enabled the differentiation of sites and run-times through robust principal components analysis and subsequent factor analysis. Dissimilarity metrics revealed outliers in performance, and a nested ANOVA model revealed the extent to which all metrics or individual metrics were impacted by mass spectrometer and run time. Study 5 data revealed that even when SOPs have been applied, instrument-dependent variability remains prominent, although it may be reduced, while within-site variability is reduced significantly. Finally, identification-independent quality metrics were shown to be predictive of identification sensitivity in these data sets. QuaMeter and the associated multivariate framework are available from http://fenchurch.mc.vanderbilt.edu and http://homepages.uc.edu/~wang2x7/ , respectively.
Energy Technology Data Exchange (ETDEWEB)
Wallace, Jack, E-mail: jack.wallace@ce.queensu.ca [Department of Civil Engineering, Queen’s University, Ellis Hall, 58 University Avenue, Kingston, Ontario K7L 3N6 (Canada); Champagne, Pascale, E-mail: champagne@civil.queensu.ca [Department of Civil Engineering, Queen’s University, Ellis Hall, 58 University Avenue, Kingston, Ontario K7L 3N6 (Canada); Monnier, Anne-Charlotte, E-mail: anne-charlotte.monnier@insa-lyon.fr [National Institute for Applied Sciences – Lyon, 20 Avenue Albert Einstein, 69621 Villeurbanne Cedex (France)
2015-01-15
Highlights: • Performance of a hybrid passive landfill leachate treatment system was evaluated. • 33 Water chemistry parameters were sampled for 21 months and statistically analyzed. • Parameters were strongly linked and explained most (>40%) of the variation in data. • Alkalinity, ammonia, COD, heavy metals, and iron were criteria for performance. • Eight other parameters were key in modeling system dynamics and criteria. - Abstract: A pilot-scale hybrid-passive treatment system operated at the Merrick Landfill in North Bay, Ontario, Canada, treats municipal landfill leachate and provides for subsequent natural attenuation. Collected leachate is directed to a hybrid-passive treatment system, followed by controlled release to a natural attenuation zone before entering the nearby Little Sturgeon River. The study presents a comprehensive evaluation of the performance of the system using multivariate statistical techniques to determine the interactions between parameters, major pollutants in the leachate, and the biological and chemical processes occurring in the system. Five parameters (ammonia, alkalinity, chemical oxygen demand (COD), “heavy” metals of interest, with atomic weights above calcium, and iron) were set as criteria for the evaluation of system performance based on their toxicity to aquatic ecosystems and importance in treatment with respect to discharge regulations. System data for a full range of water quality parameters over a 21-month period were analyzed using principal components analysis (PCA), as well as principal components (PC) and partial least squares (PLS) regressions. PCA indicated a high degree of association for most parameters with the first PC, which explained a high percentage (>40%) of the variation in the data, suggesting strong statistical relationships among most of the parameters in the system. Regression analyses identified 8 parameters (set as independent variables) that were most frequently retained for modeling
Directory of Open Access Journals (Sweden)
Zamani Abbas Ali
2012-12-01
Full Text Available Abstract The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP. Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs. Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
2012-01-01
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area. PMID:23369182
Use of multivariate statistics to identify unreliable data obtained using CASA.
Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón
2013-06-01
In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Multivariate statistical process control in product quality review assessment - A case study.
Kharbach, M; Cherrah, Y; Vander Heyden, Y; Bouklouze, A
2017-11-01
According to the Food and Drug Administration and the European Good Manufacturing Practices (GMP) guidelines, Annual Product Review (APR) is a mandatory requirement in GMP. It consists of evaluating a large collection of qualitative or quantitative data in order to verify the consistency of an existing process. According to the Code of Federal Regulation Part 11 (21 CFR 211.180), all finished products should be reviewed annually for the quality standards to determine the need of any change in specification or manufacturing of drug products. Conventional Statistical Process Control (SPC) evaluates the pharmaceutical production process by examining only the effect of a single factor at the time using a Shewhart's chart. It neglects to take into account the interaction between the variables. In order to overcome this issue, Multivariate Statistical Process Control (MSPC) can be used. Our case study concerns an APR assessment, where 164 historical batches containing six active ingredients, manufactured in Morocco, were collected during one year. Each batch has been checked by assaying the six active ingredients by High Performance Liquid Chromatography according to European Pharmacopoeia monographs. The data matrix was evaluated both by SPC and MSPC. The SPC indicated that all batches are under control, while the MSPC, based on Principal Component Analysis (PCA), for the data being either autoscaled or robust scaled, showed four and seven batches, respectively, out of the Hotelling T 2 95% ellipse. Also, an improvement of the capability of the process is observed without the most extreme batches. The MSPC can be used for monitoring subtle changes in the manufacturing process during an APR assessment. Copyright © 2017 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.
Blake, Sarah; Henry, Tiernan; Murray, John; Flood, Rory; Muller, Mark R.; Jones, Alan G.; Rath, Volker
2016-04-01
The geothermal energy of thermal groundwater is currently being exploited for district-scale heating in many locations world-wide. The chemical compositions of these thermal waters reflect the provenance and hydrothermal circulation patterns of the groundwater, which are controlled by recharge, rock type and geological structure. Exploring the provenance of these waters using multivariate statistical analysis (MSA) techniques increases our understanding of the hydrothermal circulation systems, and provides a reliable tool for assessing these resources. Hydrochemical data from thermal springs situated in the Carboniferous Dublin Basin in east-central Ireland were explored using MSA, including hierarchical cluster analysis (HCA) and principal component analysis (PCA), to investigate the source aquifers of the thermal groundwaters. To take into account the compositional nature of the hydrochemical data, compositional data analysis (CoDa) techniques were used to process the data prior to the MSA. The results of the MSA were examined alongside detailed time-lapse temperature measurements from several of the springs, and indicate the influence of three important hydrogeological processes on the hydrochemistry of the thermal waters: 1) increased salinity due to evaporite dissolution and increased water-rock-interaction; 2) dissolution of carbonates; and 3) dissolution of metal sulfides and oxides associated with mineral deposits. The use of MSA within the CoDa framework identified subtle temporal variations in the hydrochemistry of the thermal springs, which could not be identified with more traditional graphing methods (e.g., Piper diagrams), or with a standard statistical approach. The MSA was successful in distinguishing different geological settings and different annual behaviours within the group of springs. This study demonstrates the usefulness of the application of MSA within the CoDa framework in order to better understand the underlying controlling processes
Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein
2012-12-17
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Directory of Open Access Journals (Sweden)
Md. Mahtab Ali Molla
2015-01-01
Full Text Available Aims: This work evaluated the surface and groundwater quality of Mohanpur area, Rajshahi district, Bangladesh. Multivariate statistical techniques were also applied to determine the possible sources of water contamination. Materials and Methods: Water samples were collected from randomly selected ten different sampling sites for analyzing the chemical parameters including pH, electrical conductivity, total dissolved solids, total hardness, total alkalinity, Cl− , NO3− and some heavy metals such as Mn, Pb, Cd, and As concentrations. Concentrations of heavy metals were determined using atomic absorption spectrometer (AAS. Results: Based on hydrochemical characteristics, surface and groundwater in the study area were, in general, fresh, hard, and alkaline in nature. All chemical parameters were within the WHO water quality guidelines. Whereas, among four analyzed heavy metals Pb, and Cd concentrations exceeded the WHO recommended values. Pearson correlation matrix showed a number of statistically significant associations (P < 0.01 and P < 0.05 among the examined water quality parameters. Moreover, principal component (PC analysis (PCA and cluster analysis (CA were used to analyze the water quality dataset. PCA analysis identified two PCs as responsible for the data structure explaining 72.53% of the total variance in water quality. PCA indicated that the water quality variations were mainly of anthropogenic origin through agricultural and municipal discharges. Results of CA revealed three significant groups of similarity among the 10 sampling sites. Conclusions: It could be deduced from the present results that water contamination was occurred to some extent throughout the area, and is likely to be continued in the near future. Improvement of local sanitation system along with frequent training and awareness programs can help in developing water quality in the studied area.
Feyissa, Daniel D; Aher, Yogesh D; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment.
Application of Multivariate Statistical Analysis to Biomarkers in Se-Turkey Crude Oils
Gürgey, K.; Canbolat, S.
2017-11-01
Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%), Stable Carbon Isotope, Gas Chromatography (GC), and Gas Chromatography-Mass Spectrometry (GC-MS) data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE) Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.
Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice
2017-06-01
In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.
Directory of Open Access Journals (Sweden)
Qing Gu
2016-03-01
Full Text Available Qiandao Lake (Xin’an Jiang reservoir plays a significant role in drinking water supply for eastern China, and it is an attractive tourist destination. Three multivariate statistical methods were comprehensively applied to assess the spatial and temporal variations in water quality as well as potential pollution sources in Qiandao Lake. Data sets of nine parameters from 12 monitoring sites during 2010–2013 were obtained for analysis. Cluster analysis (CA was applied to classify the 12 sampling sites into three groups (Groups A, B and C and the 12 monitoring months into two clusters (April-July, and the remaining months. Discriminant analysis (DA identified Secchi disc depth, dissolved oxygen, permanganate index and total phosphorus as the significant variables for distinguishing variations of different years, with 79.9% correct assignments. Dissolved oxygen, pH and chlorophyll-a were determined to discriminate between the two sampling periods classified by CA, with 87.8% correct assignments. For spatial variation, DA identified Secchi disc depth and ammonia nitrogen as the significant discriminating parameters, with 81.6% correct assignments. Principal component analysis (PCA identified organic pollution, nutrient pollution, domestic sewage, and agricultural and surface runoff as the primary pollution sources, explaining 84.58%, 81.61% and 78.68% of the total variance in Groups A, B and C, respectively. These results demonstrate the effectiveness of integrated use of CA, DA and PCA for reservoir water quality evaluation and could assist managers in improving water resources management.
Tian, Dayong; Lü, Guodong; Zhai, Zhengang; Du, Guoli; Mo, Jiaqing; Lü, Xiaoyi
2018-01-01
In this paper, serum surface-enhanced Raman scattering and multivariate statistical analysis are used to investigate a rapid screening technique for thyroid function diseases. At present, the detection of thyroid function has become increasingly important, and it is urgently necessary to develop a rapid and portable method for the detection of thyroid function. Our experimental results show that, by using the Silmeco-based enhanced Raman signal, the signal strength greatly increases and the characteristic peak appears obviously. It is also observed that the Raman spectra of normal and anomalous thyroid function human serum are significantly different. Principal component analysis (PCA) combined with linear discriminant analysis (LDA) was used to diagnose thyroid dysfunction, and the diagnostic accuracy was 87.4%. The use of serum surface-enhanced Raman scattering technology combined with PCA–LDA shows good diagnostic performance for the rapid detection of thyroid function. By means of Raman technology, it is expected that a portable device for the rapid detection of thyroid function will be developed.
APPLICATION OF MULTIVARIATE STATISTICAL ANALYSIS TO BIOMARKERS IN SE-TURKEY CRUDE OILS
Directory of Open Access Journals (Sweden)
K. Gürgey
2017-11-01
Full Text Available Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%, Stable Carbon Isotope, Gas Chromatography (GC, and Gas Chromatography-Mass Spectrometry (GC-MS data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.
Schaefer, Kristin; Einax, Jürgen W; Simeonov, Vasil; Tsakovski, Stefan
2010-04-01
The surroundings of the former Kremikovtzi steel mill near Sofia (Bulgaria) are influenced by various emissions from the factory. In addition to steel and alloys, they produce different products based on inorganic compounds in different smelters. Soil in this region is multiply contaminated. We collected 65 soil samples and analyzed 15 elements by different methods of atomic spectroscopy for a survey of this field site. Here we present a novel hybrid approach for environmental risk assessment of polluted soil combining geostatistical methods and source apportionment modeling. We could distinguish areas with heavily and slightly polluted soils in the vicinity of the iron smelter by applying unsupervised pattern recognition methods. This result was supported by geostatistical methods such as semivariogram analysis and kriging. The modes of action of the metals examined differ significantly in such a way that iron and lead account for the main pollutants of the iron smelter, whereas, e.g., arsenic shows a haphazard distribution. The application of factor analysis and source-apportionment modeling on absolute principal component scores revealed novel information about the composition of the emissions from the different stacks. It is possible to estimate the impact of every element examined on the pollution due to their emission source. This investigation allows an objective assessment of the different spatial distributions of the elements examined in the soil of the Kremikovtzi region. The geostatistical analysis illustrates this distribution and is supported by multivariate statistical analysis revealing relations between the elements.
Directory of Open Access Journals (Sweden)
Teck-Yee Ling
2017-01-01
Full Text Available The present study evaluated the spatial variations of surface water quality in a tropical river using multivariate statistical techniques, including cluster analysis (CA and principal component analysis (PCA. Twenty physicochemical parameters were measured at 30 stations along the Batang Baram and its tributaries. The water quality of the Batang Baram was categorized as “slightly polluted” where the chemical oxygen demand and total suspended solids were the most deteriorated parameters. The CA grouped the 30 stations into four clusters which shared similar characteristics within the same cluster, representing the upstream, middle, and downstream regions of the main river and the tributaries from the middle to downstream regions of the river. The PCA has determined a reduced number of six principal components that explained 83.6% of the data set variance. The first PC indicated that the total suspended solids, turbidity, and hydrogen sulphide were the dominant polluting factors which is attributed to the logging activities, followed by the five-day biochemical oxygen demand, total phosphorus, organic nitrogen, and nitrate-nitrogen in the second PC which are related to the discharges from domestic wastewater. The components also imply that logging activities are the major anthropogenic activities responsible for water quality variations in the Batang Baram when compared to the domestic wastewater discharge.
Xu, Peng; Rizzoni, Elizabeth Anne; Sul, Se-Yeong; Stephanopoulos, Gregory
2017-01-20
Metabolic engineering entails target modification of cell metabolism to maximize the production of a specific compound. For empowering combinatorial optimization in strain engineering, tools and algorithms are needed to efficiently sample the multidimensional gene expression space and locate the desirable overproduction phenotype. We addressed this challenge by employing design of experiment (DoE) models to quantitatively correlate gene expression with strain performance. By fractionally sampling the gene expression landscape, we statistically screened the dominant enzyme targets that determine metabolic pathway efficiency. An empirical quadratic regression model was subsequently used to identify the optimal gene expression patterns of the investigated pathway. As a proof of concept, our approach yielded the natural product violacein at 525.4 mg/L in shake flasks, a 3.2-fold increase from the baseline strain. Violacein production was further increased to 1.31 g/L in a controlled benchtop bioreactor. We found that formulating discretized gene expression levels into logarithmic variables (Linlog transformation) was essential for implementing this DoE-based optimization procedure. The reported methodology can aid multivariate combinatorial pathway engineering and may be generalized as a standard procedure for accelerating strain engineering and improving metabolic pathway efficiency.
Thangaradjou, T; Raja, S; Subhashini, Pon; Nobi, E P; Dilipan, E
2013-01-01
An assessment on heavy metal (Al, Cd, Co, Cr, Cu, Fe, Mg, Mn, Ni, Pb and Zn) accumulation by seven seagrass species of Lakshadweep group of islands was carried out using multivariate statistical tools like principal component analysis (PCA) and cluster analysis (CA). Among all the metals, Mg and Al were determined in higher concentration in all the seagrasses, and their values varied with respect to different seagrass species. The concentration of the four toxic heavy metals (Cd, Pb, Zn and Cu) was found higher in all the seagrasses when compared with the background values of seagrasses from Flores Sea, Indonesia. The contamination factor of these four heavy metals ranged as Cd (1.97-12.5), Cu (0.73-4.40), Pb (2.3-8.89) and Zn (1.27-2.787). In general, the Pollution Load Index (PLI) calculated was found to be maximum for Halophila decipiens (58.2). Results revealed that Halophila decipiens is a strong accumulator of heavy metals, followed by Halodule uninervis and Halodule pinifolia, among all the tested seagrasses. Interestingly, the small-leaved seagrasses were found to be efficient in heavy metal accumulation than the large-leaved seagrass species. Thus, seagrasses can better be used for biomonitoring, and seagrasses can be used as the heavy metal sink as the biomass take usually long term to get remineralize in nature.
Scheuerer, Michael; Hamill, Thomas M.; Whitin, Brett; He, Minxue; Henkel, Arthur
2017-04-01
Hydrological forecasts strongly rely on predictions of precipitation amounts and temperature as meteorological inputs to hydrological models. Ensemble weather predictions provide a number of different scenarios that reflect the uncertainty about these meteorological inputs, but are often biased and underdispersive, and therefore require statistical postprocessing. In hydrological applications it is crucial that spatial and temporal (i.e. between different forecast lead times) dependencies as well as dependence between the two weather variables is adequately represented by the recalibrated forecasts. We present a study with temperature and precipitation forecasts over four river basins over California that are postprocessed with a variant of the nonhomogeneous Gaussian regression method (Gneiting et al., 2005) and the censored, shifted gamma distribution approach (Scheuerer and Hamill, 2015) respectively. For modelling spatial, temporal and inter-variable dependence we propose a variant of the Schaake Shuffle (Clark et al., 2005) that uses spatio-temporal trajectories of observed temperture and precipitation as a dependence template, and chooses the historic dates in such a way that the divergence between the marginal distributions of these trajectories and the univariate forecast distributions is minimized. For the four river basins considered in our study, this new multivariate modelling technique consistently improves upon the Schaake Shuffle and yields reliable spatio-temporal forecast trajectories of temperature and precipitation that can be used to force hydrological forecast systems. References: Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B., Wilby, R., 2004. The Schaake Shuffle: A method for reconstructing space-time variability in forecasted precipitation and temperature fields. Journal of Hydrometeorology, 5, pp.243-262. Gneiting, T., Raftery, A.E., Westveld, A.H., Goldman, T., 2005. Calibrated probabilistic forecasting using ensemble model output
Malm, Christer B; Khoo, Nelson S; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas
2016-01-01
achieved in two separate trials. In conclusions, autologous re-infusion of RBCs increased VO2max and performance as hypothesized, but hematological profiling by multivariate statistics could not reach the WADA stipulated false positive ratio of <0.001% at any time point investigated. A majority of samples remained within limits of normal individual variation at all times.
Directory of Open Access Journals (Sweden)
Vujović Svetlana R.
2013-01-01
Full Text Available This paper illustrates the utility of multivariate statistical techniques for analysis and interpretation of water quality data sets and identification of pollution sources/factors with a view to get better information about the water quality and design of monitoring network for effective management of water resources. Multivariate statistical techniques, such as factor analysis (FA/principal component analysis (PCA and cluster analysis (CA, were applied for the evaluation of variations and for the interpretation of a water quality data set of the natural water bodies obtained during 2010 year of monitoring of 13 parameters at 33 different sites. FA/PCA attempts to explain the correlations between the observations in terms of the underlying factors, which are not directly observable. Factor analysis is applied to physico-chemical parameters of natural water bodies with the aim classification and data summation as well as segmentation of heterogeneous data sets into smaller homogeneous subsets. Factor loadings were categorized as strong and moderate corresponding to the absolute loading values of >0.75, 0.75-0.50, respectively. Four principal factors were obtained with Eigenvalues >1 summing more than 78 % of the total variance in the water data sets, which is adequate to give good prior information regarding data structure. Each factor that is significantly related to specific variables represents a different dimension of water quality. The first factor F1 accounting for 28 % of the total variance and represents the hydrochemical dimension of water quality. The second factor F2 accounting for 18% of the total variance and may be taken factor of water eutrophication. The third factor F3 accounting 17 % of the total variance and represents the influence of point sources of pollution on water quality. The fourth factor F4 accounting 13 % of the total variance and may be taken as an ecological dimension of water quality. Cluster analysis (CA is an
Tay, C. K.; Hayford, E. K.; Hodgson, I. O. A.
2017-06-01
Multivariate statistical technique and hydrogeochemical approach were employed for groundwater assessment within the Lower Pra Basin. The main objective was to delineate the main processes that are responsible for the water chemistry and pollution of groundwater within the basin. Fifty-four (54) (No) boreholes were sampled in January 2012 for quality assessment. PCA using Varimax with Kaiser Normalization method of extraction for both rotated space and component matrix have been applied to the data. Results show that Spearman's correlation matrix of major ions revealed expected process-based relationships derived mainly from the geochemical processes, such as ion-exchange and silicate/aluminosilicate weathering within the aquifer. Three main principal components influence the water chemistry and pollution of groundwater within the basin. The three principal components have accounted for approximately 79% of the total variance in the hydrochemical data. Component 1 delineates the main natural processes (water-soil-rock interactions) through which groundwater within the basin acquires its chemical characteristics, Component 2 delineates the incongruent dissolution of silicate/aluminosilicates, while Component 3 delineates the prevalence of pollution principally from agricultural input as well as trace metal mobilization in groundwater within the basin. The loadings and score plots of the first two PCs show grouping pattern which indicates the strength of the mutual relation among the hydrochemical variables. In terms of proper management and development of groundwater within the basin, communities, where intense agriculture is taking place, should be monitored and protected from agricultural activities. especially where inorganic fertilizers are used by creating buffer zones. Monitoring of the water quality especially the water pH is recommended to ensure the acid neutralizing potential of groundwater within the basin thereby, curtailing further trace metal
Baez-Cazull, S. E.; McGuire, J.T.; Cozzarelli, I.M.; Voytek, M.A.
2008-01-01
Determining the processes governing aqueous biogeochemistry in a wetland hydrologically linked to an underlying contaminated aquifer is challenging due to the complex exchange between the systems and their distinct responses to changes in precipitation, recharge, and biological activities. To evaluate temporal and spatial processes in the wetland-aquifer system, water samples were collected using cm-scale multichambered passive diffusion samplers (peepers) to span the wetland-aquifer interface over a period of 3 yr. Samples were analyzed for major cations and anions, methane, and a suite of organic acids resulting in a large dataset of over 8000 points, which was evaluated using multivariate statistics. Principal component analysis (PCA) was chosen with the purpose of exploring the sources of variation in the dataset to expose related variables and provide insight into the biogeochemical processes that control the water chemistry of the system. Factor scores computed from PCA were mapped by date and depth. Patterns observed suggest that (i) fermentation is the process controlling the greatest variability in the dataset and it peaks in May; (ii) iron and sulfate reduction were the dominant terminal electron-accepting processes in the system and were associated with fermentation but had more complex seasonal variability than fermentation; (iii) methanogenesis was also important and associated with bacterial utilization of minerals as a source of electron acceptors (e.g., barite BaSO4); and (iv) seasonal hydrological patterns (wet and dry periods) control the availability of electron acceptors through the reoxidation of reduced iron-sulfur species enhancing iron and sulfate reduction. Copyright ?? 2008 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.
Multivariate statistical analysis as a tool for the segmentation of 3D spectral data.
Lucas, G; Burdet, P; Cantoni, M; Hébert, C
2013-01-01
Acquisition of three-dimensional (3D) spectral data is nowadays common using many different microanalytical techniques. In order to proceed to the 3D reconstruction, data processing is necessary not only to deal with noisy acquisitions but also to segment the data in term of chemical composition. In this article, we demonstrate the value of multivariate statistical analysis (MSA) methods for this purpose, allowing fast and reliable results. Using scanning electron microscopy (SEM) and energy-dispersive X-ray spectroscopy (EDX) coupled with a focused ion beam (FIB), a stack of spectrum images have been acquired on a sample produced by laser welding of a nickel-titanium wire and a stainless steel wire presenting a complex microstructure. These data have been analyzed using principal component analysis (PCA) and factor rotations. PCA allows to significantly improve the overall quality of the data, but produces abstract components. Here it is shown that rotated components can be used without prior knowledge of the sample to help the interpretation of the data, obtaining quickly qualitative mappings representative of elements or compounds found in the material. Such abundance maps can then be used to plot scatter diagrams and interactively identify the different domains in presence by defining clusters of voxels having similar compositions. Identified voxels are advantageously overlaid on secondary electron (SE) images with higher resolution in order to refine the segmentation. The 3D reconstruction can then be performed using available commercial softwares on the basis of the provided segmentation. To asses the quality of the segmentation, the results have been compared to an EDX quantification performed on the same data. Copyright © 2013 Elsevier Ltd. All rights reserved.
Djorgovski, S. G.
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has
Energy Technology Data Exchange (ETDEWEB)
Antignac, Jean-Philippe [Ecole Nationale Veterinaire de Nantes (ENVN), Laboratoire d' Etude des Residus et Contaminants dans les Aliments (LABERCA), Nantes (France); LABERCA-ENVN, Nantes (France); Marchand, Philippe; Gade, Christel; Matayron, Gilles; Bizec, Bruno Le; Andre, Francois [Ecole Nationale Veterinaire de Nantes (ENVN), Laboratoire d' Etude des Residus et Contaminants dans les Aliments (LABERCA), Nantes (France); Qannari, El Mostafa [Ecole Nationale d' Ingenieurs des Techniques des Industries Agricoles et Alimentaires (ENITIAA), Unite de Sensometrie et de Chimiometrie, La Geraudiere, Nantes (France)
2006-01-01
Polychlorinated dibenzo-p-dioxins (PCDD) and polychlorinated dibenzofurans (PCDF) are widely recognized by the scientific community as persistent organic pollutants due to their toxicity and adverse effects on wildlife and human health. The actual regulation dedicated to the monitoring of dioxins in food is based on the measurement of 17 congener concentrations. The final result is reported as a toxic equivalent value that takes into account the relative toxicity of each congener. This procedure can minimize the qualitative information available from the abundances of each PCDD/PCDF congener: the characteristic contamination profile of the sample. Multivariate statistical techniques, such as principal component analysis (PCA) or linear discriminant analysis (LDA), represent an interesting way to investigate this qualitative information. Nevertheless, they have only been applied to the analysis of contamination data from food products and biological matrices infrequently. The objective of the present study was to analyze a large data set from dioxin analyses performed on various food products of animal origin. The results demonstrate the existence of differences in congener-specific patterns between the analyzed samples. Variability was first demonstrated in terms of the food type (fish, meat, milk, fatty products). Then a variability was observed that was related to the specific animal species for meat and milk samples (bovine, ovine, porcine, caprine and poultry). Some practical applications of these results are discussed. The origin(s) of the observed differences, as well as their significance, now remain to be investigated, both in terms of environmental factors and transfer through living organisms. A better knowledge of the relation between a contamination profile and its specific source and/or food product should be of great interest to scientists working in the fields of contaminant analysis, toxicology and metabolism, as well as to regulatory bodies and
Multivariate Statistical Analysis of Cigarette Design Feature Influence on ISO TNCO Yields.
Agnew-Heard, Kimberly A; Lancaster, Vicki A; Bravo, Roberto; Watson, Clifford; Walters, Matthew J; Holman, Matthew R
2016-06-20
The aim of this study is to explore how differences in cigarette physical design parameters influence tar, nicotine, and carbon monoxide (TNCO) yields in mainstream smoke (MSS) using the International Organization of Standardization (ISO) smoking regimen. Standardized smoking methods were used to evaluate 50 U.S. domestic brand cigarettes and a reference cigarette representing a range of TNCO yields in MSS collected from linear smoking machines using a nonintense smoking regimen. Multivariate statistical methods were used to form clusters of cigarettes based on their ISO TNCO yields and then to explore the relationship between the ISO generated TNCO yields and the nine cigarette physical design parameters between and within each cluster simultaneously. The ISO generated TNCO yields in MSS are 1.1-17.0 mg tar/cigarette, 0.1-2.2 mg nicotine/cigarette, and 1.6-17.3 mg CO/cigarette. Cluster analysis divided the 51 cigarettes into five discrete clusters based on their ISO TNCO yields. No one physical parameter dominated across all clusters. Predicting ISO machine generated TNCO yields based on these nine physical design parameters is complex due to the correlation among and between the nine physical design parameters and TNCO yields. From these analyses, it is estimated that approximately 20% of the variability in the ISO generated TNCO yields comes from other parameters (e.g., filter material, filter type, inclusion of expanded or reconstituted tobacco, and tobacco blend composition, along with differences in tobacco leaf origin and stalk positions and added ingredients). A future article will examine the influence of these physical design parameters on TNCO yields under a Canadian Intense (CI) smoking regimen. Together, these papers will provide a more robust picture of the design features that contribute to TNCO exposure across the range of real world smoking patterns.
McKinley, C. C.; Scudder, R.; Thomas, D. J.
2016-12-01
The Neodymium Isotopic composition (Nd IC) of oxide coatings has been applied as a tracer of water mass composition and used to address fundamental questions about past ocean conditions. The leached authigenic oxide coating from marine sediment is widely assumed to reflect the dissolved trace metal composition of the bottom water interacting with sediment at the seafloor. However, recent studies have shown that readily reducible sediment components, in addition to trace metal fluxes from the pore water, are incorporated into the bottom water, influencing the trace metal composition of leached oxide coatings. This challenges the prevailing application of the authigenic oxide Nd IC as a proxy of seawater composition. Therefore, it is important to identify the component end-members that create sediments of different lithology and determine if, or how they might contribute to the Nd IC of oxide coatings. To investigate lithologic influence on the results of sequential leaching, we selected two sites with complete bulk sediment statistical characterization. Site U1370 in the South Pacific Gyre, is predominantly composed of Rhyolite ( 60%) and has a distinguishable ( 10%) Fe-Mn Oxyhydroxide component (Dunlea et al., 2015). Site 1149 near the Izu-Bonin-Arc is predominantly composed of dispersed ash ( 20-50%) and eolian dust from Asia ( 50-80%) (Scudder et al., 2014). We perform a two-step leaching procedure: a 14 mL of 0.02 M hydroxylamine hydrochloride (HH) in 20% acetic acid buffered to a pH 4 for one hour, targeting metals bound to Fe- and Mn- oxides fractions, and a second HH leach for 12 hours, designed to remove any remaining oxides from the residual component. We analyze all three resulting fractions for a large suite of major, trace and rare earth elements, a sub-set of the samples are also analyzed for Nd IC. We use multivariate statistical analyses of the resulting geochemical data to identify how each component of the sediment partitions across the sequential
A Decision Tree Approach to the Interpretation of Multivariate Statistical Techniques.
Fok, Lillian Y.; And Others
1995-01-01
Discusses the nature, power, and limitations of four multivariate techniques: factor analysis, multiple analysis of variance, multiple regression, and multiple discriminant analysis. Shows how decision trees assist in interpreting results. (SK)
National Research Council Canada - National Science Library
Malik, Riffat Naseem; Hashmi, Muhammad Zaffar
2017-01-01
...; thus, the water quality is closely related to public health. Multivariate techniques were applied to check spatial and seasonal trends, and metals contamination sources of the Himalayan foothills streams, Pakistan...
ABDULLAH, AHMED
2007-01-01
This project involved two locations (Breda and Tel Hadya) over two seasons (1993 and 1994). Yield was found to have been affected by many factors including environment, genotype and morphological characters. A genotype-environment interaction (GEl) was also discovered. To investigate the influence of morphological characters on yield parameters, multivariate statistical techniques (canonical analysis, factor analysis and multiple regression analysis (linear and exponentia...
Kemperman, Ramses F. J.; Horvatovich, Peter L.; Hoekman, Berend; Reijmers, Theo H.; Muskiet, Frits A. J.; Bischoff, Rainer
2007-01-01
We describe a platform for the comparative profiling of urine using reversed-phase liquid chromatography-mass spectrometry (LC-MS) and multivariate statistical data analysis. Urinary compounds were separated by gradient elution and subsequently detected by electrospray Ion-Trap MS. The lower limit
Directory of Open Access Journals (Sweden)
M.A. Delavar
2016-02-01
Full Text Available Introduction: The accumulation of heavy metals (HMs in the soil is of increasing concern due to food safety issues, potential health risks, and the detrimental effects on soil ecosystems. HMs may be considered as the most important soil pollutants, because they are not biodegradable and their physical movement through the soil profile is relatively limited. Therefore, root uptake process may provide a big chance for these pollutants to transfer from the surface soil to natural and cultivated plants, which may eventually steer them to human bodies. The general behavior of HMs in the environment, especially their bioavailability in the soil, is influenced by their origin. Hence, source apportionment of HMs may provide some essential information for better management of polluted soils to restrict the HMs entrance to the human food chain. This paper explores the applicability of multivariate statistical techniques in the identification of probable sources that can control the concentration and distribution of selected HMs in the soils surrounding the Zanjan Zinc Specialized Industrial Town (briefly Zinc Town. Materials and Methods: The area under investigation has a size of approximately 4000 ha.It is located around the Zinc Town, Zanjan province. A regular grid sampling pattern with an interval of 500 meters was applied to identify the sample location, and 184 topsoil samples (0-10 cm were collected. The soil samples were air-dried and sieved through a 2 mm polyethylene sieve and then, were digested using HNO3. The total concentrations of zinc (Zn, lead (Pb, cadmium (Cd, Nickel (Ni and copper (Cu in the soil solutions were determined via Atomic Absorption Spectroscopy (AAS. Data were statistically analyzed using the SPSS software version 17.0 for Windows. Correlation Matrix (CM, Principal Component Analyses (PCA and Factor Analyses (FA techniques were performed in order to identify the probable sources of HMs in the studied soils. Results and
Banoeng-Yakubo, B.; Yidana, S.M.; Nti, E.
2009-01-01
Q and R-mode multivariate statistical analyses were applied to groundwater chemical data from boreholes and wells in the northern section of the Volta region Ghana. The objective was to determine the processes that affect the hydrochemistry and the variation of these processes in space among the three main geological terrains: the Buem formation, Voltaian System and the Togo series that underlie the area. The analyses revealed three zones in the groundwater flow system: recharge, intermediate and discharge regions. All three zones are clearly different with respect to all the major chemical parameters, with concentrations increasing from the perceived recharge areas through the intermediate regions to the discharge areas. R-mode HCA and factor analysis (using varimax rotation and Kaiser Criterion) were then applied to determine the significant sources of variation in the hydrochemistry. This study finds that groundwater hydrochemistry in the area is controlled by the weathering of silicate and carbonate minerals, as well as the chemistry of infiltrating precipitation. This study finds that the ??D and ??18O data from the area fall along the Global Meteoric Water Line (GMWL). An equation of regression derived for the relationship between ??D and ??18O bears very close semblance to the equation which describes the GMWL. On the basis of this, groundwater in the study area is probably meteoric and fresh. The apparently low salinities and sodicities of the groundwater seem to support this interpretation. The suitability of groundwater for domestic and irrigation purposes is related to its source, which determines its constitution. A plot of the sodium adsorption ratio (SAR) and salinity (EC) data on a semilog axis, suggests that groundwater serves good irrigation quality in the area. Sixty percent (60%), 20% and 20% of the 67 data points used in this study fall within the medium salinity - low sodicity (C2-S1), low salinity -low sodicity (C1-S1) and high salinity - low
Ahmed, Fahad; Fakhruddin, A. N. M.; Imam, MD. Toufick; Khan, Nasima; Abdullah, Abu Tareq Mohammad; Khan, Tanzir Ahmed; Rahman, Md. Mahfuzur; Uddin, Mohammad Nashir
2017-11-01
In this study, multivariate statistical techniques in collaboration with GIS are used to assess the roadside surface water quality of Savar region. Nineteen water samples were collected in dry season and 15 water quality parameters including TSS, TDS, pH, DO, BOD, Cl-, F-, NO3 2-, NO2 -, SO4 2-, Ca, Mg, K, Zn and Pb were measured. The univariate overview of water quality parameters are TSS 25.154 ± 8.674 mg/l, TDS 840.400 ± 311.081 mg/l, pH 7.574 ± 0.256 pH unit, DO 4.544 ± 0.933 mg/l, BOD 0.758 ± 0.179 mg/l, Cl- 51.494 ± 28.095 mg/l, F- 0.771 ± 0.153 mg/l, NO3 2- 2.211 ± 0.878 mg/l, NO2 - 4.692 ± 5.971 mg/l, SO4 2- 69.545 ± 53.873 mg/l, Ca 48.458 ± 22.690 mg/l, Mg 19.676 ± 7.361 mg/l, K 12.874 ± 11.382 mg/l, Zn 0.027 ± 0.029 mg/l, Pb 0.096 ± 0.154 mg/l. The water quality data were subjected to R-mode PCA which resulted in five major components. PC1 explains 28% of total variance and indicates the roadside and brick field dust settle down (TDS, TSS) in the nearby water body. PC2 explains 22.123% of total variance and indicates the agricultural influence (K, Ca, and NO2 -). PC3 describes the contribution of nonpoint pollution from agricultural and soil erosion processes (SO4 2-, Cl-, and K). PC4 depicts heavy positively loaded by vehicle emission and diffusion from battery stores (Zn, Pb). PC5 depicts strong positive loading of BOD and strong negative loading of pH. Cluster analysis represents three major clusters for both water parameters and sampling sites. The site based on cluster showed similar grouping pattern of R-mode factor score map. The present work reveals a new scope to monitor the roadside water quality for future research in Bangladesh.
Ahmed, Fahad; Fakhruddin, A. N. M.; Imam, MD. Toufick; Khan, Nasima; Abdullah, Abu Tareq Mohammad; Khan, Tanzir Ahmed; Rahman, Md. Mahfuzur; Uddin, Mohammad Nashir
2017-09-01
In this study, multivariate statistical techniques in collaboration with GIS are used to assess the roadside surface water quality of Savar region. Nineteen water samples were collected in dry season and 15 water quality parameters including TSS, TDS, pH, DO, BOD, Cl-, F-, NO3 2-, NO2 -, SO4 2-, Ca, Mg, K, Zn and Pb were measured. The univariate overview of water quality parameters are TSS 25.154 ± 8.674 mg/l, TDS 840.400 ± 311.081 mg/l, pH 7.574 ± 0.256 pH unit, DO 4.544 ± 0.933 mg/l, BOD 0.758 ± 0.179 mg/l, Cl- 51.494 ± 28.095 mg/l, F- 0.771 ± 0.153 mg/l, NO3 2- 2.211 ± 0.878 mg/l, NO2 - 4.692 ± 5.971 mg/l, SO4 2- 69.545 ± 53.873 mg/l, Ca 48.458 ± 22.690 mg/l, Mg 19.676 ± 7.361 mg/l, K 12.874 ± 11.382 mg/l, Zn 0.027 ± 0.029 mg/l, Pb 0.096 ± 0.154 mg/l. The water quality data were subjected to R-mode PCA which resulted in five major components. PC1 explains 28% of total variance and indicates the roadside and brick field dust settle down (TDS, TSS) in the nearby water body. PC2 explains 22.123% of total variance and indicates the agricultural influence (K, Ca, and NO2 -). PC3 describes the contribution of nonpoint pollution from agricultural and soil erosion processes (SO4 2-, Cl-, and K). PC4 depicts heavy positively loaded by vehicle emission and diffusion from battery stores (Zn, Pb). PC5 depicts strong positive loading of BOD and strong negative loading of pH. Cluster analysis represents three major clusters for both water parameters and sampling sites. The site based on cluster showed similar grouping pattern of R-mode factor score map. The present work reveals a new scope to monitor the roadside water quality for future research in Bangladesh.
Cormanich, Rodrigo A; Goodarzi, Mohammad; Freitas, Matheus P
2009-02-01
Inhibition of tyrosine kinase enzyme WEE1 is an important step for the treatment of cancer. The bioactivities of a series of WEE1 inhibitors have been previously modeled through comparative molecular field analyses (CoMFA and CoMSIA), but a two-dimensional image-based quantitative structure-activity relationship approach has shown to be highly predictive for other compound classes. This method, called multivariate image analysis applied to quantitative structure-activity relationship, was applied here to derive quantitative structure-activity relationship models. Whilst the well-known bilinear and multilinear partial least squares regressions (PLS and N-PLS, respectively) correlated multivariate image analysis descriptors with the corresponding dependent variables only reasonably well, the use of wavelet and principal component ranking as variable selection methods, together with least-squares support vector machine, improved significantly the prediction statistics. These recently implemented mathematical tools, particularly novel in quantitative structure-activity relationship studies, represent an important advance for the development of more predictive quantitative structure-activity relationship models and, consequently, new drugs.
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance
DEFF Research Database (Denmark)
Larsen, Thomas Schou; Krogh, Anders Stærmose
2003-01-01
annotated as genes.Results: In this paper, we present a new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome. Using extensions of similarities...... is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain.Conclusions: The result is a flexible gene finder whose overall performance matches or exceeds...
Digital Repository Service at National Institute of Oceanography (India)
Jayalakshmy, K.V.; Rao, K.K.
Harbour, En- gland: a reappraisal using multivariate tech- niques. J. Paleontol., 43 (3) : 660-675. Imbrie, J. and F.B. Phleger. 1963. Analisis por vectores de los foraminiferos bentonicos del area de San Diego, California. Soc. Geol. Mex., Bol., 26...
Afkhami, Abbas; Khajavi, Farzad; Khanmohammadi, Hamid
2009-08-11
The oxidation of the recently synthesized Schiff base 3,6-bis((2-aminoethyl-5-Br-salicyliden)thio)pyridazine (PABST) with hydrogen peroxide was investigated using spectrophotometric studies. The reaction rate order and observed rate constant of the oxidation reaction was obtained in the mixture of N,N-dimethylformamide (DMF):water (30:70, v/v) at pH 10 using multivariate cure resolution alternative least squares (MCR-ALS) method and rank annihilation factor analysis (RAFA). The effective parameters on the oxidation rate constant such as percents of DMF, the effect of transition metals like Cu(2+), Zn(2+), Mn(2+) and Hg(2+) and the presence of surfactants were investigated. The keto-enol equilibria in DMF:water (30:70, v/v) solution at pH 7.6 was also investigated in the presence of surfactants. At concentrations above critical micelle concentration (cmc) of cationic surfactant cetyltrimethylammonium bromide (CTAB), the keto form was the predominant species, while at concentrations above cmc of anionic surfactant sodium dodecyl sulfate (SDS), the enol form was the predominant species. The kinetic reaction order and the rate constant of tautomerization in micellar medium were obtained using MCR-ALS and RAFA. The results obtained by both the methods were in a good agreement with each other. Also the effect of different volume percents of DMF on the rate constant of tautomerization was investigated. The neutral surfactant (Triton X-100) had no effect on tautomerization equilibrium.
Zhang, Tingting; Pham, Minh; Sun, Jianhui; Yan, Guofen; Li, Huazhang; Sun, Yinge; Gonzalez, Marlen Z; Coan, James A
2017-12-26
The focus of this paper is on evaluating brain responses to different stimuli and identifying brain regions with different responses using multi-subject, stimulus-evoked functional magnetic resonance imaging (fMRI) data. To jointly model many brain voxels' responses to designed stimuli, we present a new low-rank multivariate general linear model (LRMGLM) for stimulus-evoked fMRI data. The new model not only is flexible to characterize variation in hemodynamic response functions (HRFs) across different regions and stimulus types, but also enables information "borrowing" across voxels and uses much fewer parameters than typical nonparametric models for HRFs. To estimate the proposed LRMGLM, we introduce a new penalized optimization function, which leads to temporally and spatially smooth HRF estimates. We develop an efficient optimization algorithm to minimize the optimization function and identify the voxels with different responses to stimuli. We show that the proposed method can outperform several existing voxel-wise methods by achieving both high sensitivity and specificity. We apply the proposed method to the fMRI data collected in an emotion study, and identify anterior dACC to have different responses to a designed threat and control stimuli. Copyright © 2017. Published by Elsevier Inc.
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance
Directory of Open Access Journals (Sweden)
Larsen Thomas
2003-06-01
Full Text Available Abstract Background Contrary to other areas of sequence analysis, a measure of statistical significance of a putative gene has not been devised to help in discriminating real genes from the masses of random Open Reading Frames (ORFs in prokaryotic genomes. Therefore, many genomes have too many short ORFs annotated as genes. Results In this paper, we present a new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM that is automatically estimated for a new genome. Using extensions of similarities in Swiss-Prot, a high quality training set of genes is automatically extracted from the genome and used to estimate the HMM. Putative genes are then scored with the HMM, and based on score and length of an ORF, the statistical significance is calculated. The measure of statistical significance for an ORF is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain. Conclusions The result is a flexible gene finder whose overall performance matches or exceeds other methods. The entire pipeline of computer processing from the raw input of a genome or set of contigs to a list of putative genes with significance is automated, making it easy to apply EasyGene to newly sequenced organisms. EasyGene with pre-trained models can be accessed at http://www.cbs.dtu.dk/services/EasyGene.
Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization
2014-03-27
free grammar DoD Department of Defense EM Expectation Maximization EU European Union FTD Foreign Translation Division IBM International Business Machine...RBMT rule-based machine translation SCFG synchronous context-free grammar SGE Sun Grid Engine SL source language SMT statistical machine translation...Chinese, English , French, Russian and Spanish as its official languages [14]. This means that a member may speak in any of these languages with the
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis
Directory of Open Access Journals (Sweden)
Đula Borozan
2014-03-01
Full Text Available The paper deals with the application of multivariate analysis of variance and logistic regression in measuring, explaining and evaluating (i gender differences in expressing migration aspirations, and (ii a gender effect on migration motivation of university students in Croatia. The results supported the thesis that migration is a complex gendering process that assumes subjective assessment of the whole set of interrelated motives. According to logistic regression, gender is a significant predictor of migration aspirations among the selected demographic and socio-economic variables. A multivariate analysis of variance showed that gender and migration aspirations in interaction matter when it comes to migration motives, particularly related to the perceived importance of social networks. Females, and especially those who aspire to migrate, assessed these motives as more important than males.
Kutsumi, M.; terada, K.; Tajima, F.; Kitano, T.
2012-12-01
In order to find physical and chemical environment factors which relate to the fish fauna distribution, we investigated the temporal and spatial change of water qualities and fish distributions in Kaname river, Japan. We investigated the fish distribution, physical water parameters (temperature, salinity, dissolved oxygen, Chl-a and turbidity) and chemical water parameters (nitrate, nitrite, ammonia, orthophosphoric and suspended solids). We conducted the multivariate analyses using these observational data and discussed the relationship between water environment parameters and fish habitat distribution.
Dutta, Dibyendu; Das, Prabir Kumar; Bhunia, Uttam Kumar; Singh, Upasana; Singh, Shalini; Sharma, Jaswant Raj; Dadhwal, Vinay Kumar
2015-04-01
In the present study, field based hyperspectral data was used to estimate the tea (Camellia sinensis L.) polyphenol at Deha Tea garden of Assam state, India. Leaf reflectance spectra were first filtered for noise and then transformed into normalized and first derivative reflectance for further analysis. Stepwise discriminant analysis was carried out to select sensitive bands for a range of polyphenol concentration by minimizing the effects of other factors such as age of the bushes and management practices. The wavelengths at 358, 369, 484, 845, 916, 1387, 1420, 1435, 1621 and 2294 nm were identified as sensitive to tea polyphenol, among which 2294 nm was found to be the most recurring band. The noise removed selected bands, their transformed derivatives and principal components were regressed with the tea polyphenol using univariate and multi-variate analysis. In univariate analysis the correlation was very poor with RMSE more than 3.0. A significant improvement in R2 values were observed when multivariate analyses like stepwise multiple linear regression (SMLR) and partial least square regression (PLSR) was carried out. The PLSR of first derivative reflectance was most accurate (R2 = 0.81 and RMSE = 1.39 mg g-1) among all the uni- and multivariate analysis for predicting the polyphenol of fresh tea leaves.
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Wolf, S. F.; Lipschutz, M. E.
1992-07-01
Differences have been observed between meteorite populations with vastly different terrestrial ages, i.e. Antarctic and non-Antarctic meteorite populations (Koeberl and Cassidy, 1991 and references therein). Comparisons of labile trace element contents (Wolf and Lipschutz, 1992) and induced TL parameters (Benoit and Sears, 1992) in samples from Victoria Land and Queen Maud Land, populations which also differ in mean terrestrial age (Nishiizumi et al, 1989), show significant differences consistent with different average thermal histories. These differences are consistent with the proposition that the flux of meteoritic material to Earth varied temporally. Variations in the flux of meteoritic material over time scales of 10^5 10^6 y require the existence of undispersed streams of meteoroids of asteroidal origin which were initially disputed by Wetherill ( 1986) but have since been observed (Olsson-Steele, 1988; Oberst, 1989; Halliday et al. 1990). Orbital evidence for meteoroid and asteroid streams has been independently obtained by others, particularly Halliday et al.(1990) and Drummond (1991). A group of H chondrites of various petrographic types and diverse CRE ages that yielded 16 falls from 1855 until 1895 in the month of May has been proposed to be two co-orbital meteoroid streams with a common source (R. T. Dodd, personal communication). Compositional evidence of a preterrestrial association of the proposed stream members, if it exists, might be observed in the most sensitive indicators of genetic thermal history, the labile trace elements. We report RNAA data for the concentrations of 14 trace elements, mostly labile ones, (Ag, Au, Bi, Cd, Cs, Co, Ga, In, Rb, Sb, Se, Te, Tl, and Zn) in H4-6 ordinary chondrites. Variance of elemental concentrations within a subpopulation, the members of a proposed co-orbital meteorite stream for example, could be expected to be smaller than the variance for the entire population. We utilize multivariate linear regression and
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Fault detection of a spur gear using vibration signal with multivariable statistical parameters
Directory of Open Access Journals (Sweden)
Songpon Klinchaeam
2014-10-01
Full Text Available This paper presents a condition monitoring technique of a spur gear fault detection using vibration signal analysis based on time domain. Vibration signals were acquired from gearboxes and used to simulate various faults on spur gear tooth. In this study, vibration signals were applied to monitor a normal and various fault conditions of a spur gear such as normal, scuffing defect, crack defect and broken tooth. The statistical parameters of vibration signal were used to compare and evaluate the value of fault condition. This technique can be applied to set alarm limit of the signal condition based on statistical parameter such as variance, kurtosis, rms and crest factor. These parameters can be used to set as a boundary decision of signal condition. From the results, the vibration signal analysis with single statistical parameter is unclear to predict fault of the spur gears. The using at least two statistical parameters can be clearly used to separate in every case of fault detection. The boundary decision of statistical parameter with the 99.7% certainty ( 3 from 300 referenced dataset and detected the testing condition with 99.7% ( 3 accuracy and had an error of less than 0.3 % using 50 testing dataset.
A comparison of demographic behaviour between the CR regions using multivariate statistical methods
Directory of Open Access Journals (Sweden)
Marie Prášilová
2008-01-01
Full Text Available The inhabitants of separate CR regions show varying demographic behaviour that presents itself in the demographic measures´ values. The paper offers a comparison of the development of selected measures of size and movement of the population in the regions of CR in 1993 and 2006 years. Attention is paid to the changes in measures of economic and biological structure, life expectancy and some of the measures of human reproduction, patterns of growth and migration. Multivariate analysis methods have been employed for the solution. Selection of variables has been carried out in each year using factor analysis and similarity of the regions has been described by the hierarchic agglomerative clustering method. During the thirteen years, changes occurred in demographic behaviour of the regions. Currently the Capital Prague and the Středočeský Region differ significantly. All the regions remaining have been stabilized as concerns the demographic measures and they show homogeneity.
Jiang, Peng; Ma, Zhenmin; Wen, Ming
2017-03-01
This study was carried out to assess the overall water quality and identify major chlorinated hydrocarbon variables affecting the groundwater quality. The source apportionment of groundwater pollution is important for the efficient management of groundwater resources.Based on 13 variables surveyed at 43 monitoring sites,the comprehensive application of different multivariate methods were used for determining source apportionment of groundwater chlorinated hydrocarbon pollutants in study area. Factor analysis and cluster analysis were applied to the identification of pollution sources and four potential pollution sources that explained 92.810% of the total variance were identified.The absolute principal component score-multiple linear regression was adopted to calculate the contribution of each pollution source. Regression results revealed that most variables were primarily influenced by chemical industry,electrical manufacturing,chemical fiber and agricultural source.The contributions of each pollution source to the entire study area were 43%, 32%, 14% and 11% respectively.
An Application of Multivariate Statistical Analysis for Query-Driven Visualization
Energy Technology Data Exchange (ETDEWEB)
Gosink, Luke J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Garth, Christoph [Univ. of California, Davis, CA (United States); Anderson, John C. [Univ. of California, Davis, CA (United States); Bethel, E. Wes [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Joy, Kenneth I. [Univ. of California, Davis, CA (United States)
2011-03-01
Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.
Xu, Min; Zhang, Lei; Yue, Hong-Shui; Pang, Hong-Wei; Ye, Zheng-Liang; Ding, Li
2017-10-01
To establish an on-line monitoring method for extraction process of Schisandrae Chinensis Fructus, the formula medicinal material of Yiqi Fumai lyophilized injection by combining near infrared spectroscopy with multi-variable data analysis technology. The multivariate statistical process control (MSPC) model was established based on 5 normal batches in production and 2 test batches were monitored by PC scores, DModX and Hotelling T2 control charts. The results showed that MSPC model had a good monitoring ability for the extraction process. The application of the MSPC model to actual production process could effectively achieve on-line monitoring for extraction process of Schisandrae Chinensis Fructus, and can reflect the change of material properties in the production process in real time. This established process monitoring method could provide reference for the application of process analysis technology in the process quality control of traditional Chinese medicine injections. Copyright© by the Chinese Pharmaceutical Association.
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
Jha, Dilip Kumar; Vinithkumar, N V; Sahu, Biraja Kumar; Das, Apurba Kumar; Dheenan, P S; Venkateshwaran, P; Begum, Mehmuna; Ganesh, T; Prashanthi Devi, M; Kirubagaran, R
2014-08-15
Aerial Bay is one of the harbor towns of Andaman and Nicobar Islands, the union territory of India. Nevertheless, it is least studied marine environment, particularly for physico-chemical assessment. Therefore, to evaluate the annual spatiotemporal variations of physico-chemical parameters, seawater samples collected from 20 sampling stations covering three seasons were analyzed. Multivariate statistics is applied to the investigated data in an attempt to understand the causes of variation in physico-chemical parameters. Cluster analysis distinguished mangrove and open sea stations from other areas by considering distinctive physico-chemical characteristics. Factor analysis revealed 79.5% of total variance in physico-chemical parameters. Strong loading included transparency, TSS, DO, BOD, salinity, nitrate, nitrite, inorganic phosphate, total phosphorus and silicate. In addition, box-whisker plots and Geographical Information System based land use data further facilitated and supported multivariate results. Copyright © 2014 Elsevier Ltd. All rights reserved.
Brunner, Manuela Irene; Vannier, Olivier; Favre, Anne-Catherine; Viviroli, Daniel; Meylan, Paul; Sikorska, Anna; Seibert, Jan
2016-01-01
Accurate estimations of flood peaks, volumes and hydrographs are needed to design safe and cost-effective hydraulic structures. In this study, we propose a statistical approach for the estimation of the design variables peak and volume by constructing a synthetic design hydrograph. Our approach is based on fitting probability density functions to observed flood hydrographs and takes the dependence between the two variables peak and volume into account. The method consists of the following six...
A.-M. Guerry's Moral Statistics of France: Challenges for Multivariable Spatial Analysis
Friendly, Michael
2008-01-01
Andr\\'{e}-Michel Guerry's (1833) Essai sur la Statistique Morale de la France was one of the foundation studies of modern social science. Guerry assembled data on crimes, suicides, literacy and other ``moral statistics,'' and used tables and maps to analyze a variety of social issues in perhaps the first comprehensive study relating such variables. Indeed, the Essai may be considered the book that launched modern empirical social science, for the questions raised and the methods Guerry develo...
Malik, Riffat Naseem; Hashmi, Muhammad Zaffar
2017-10-01
Himalayan foothills streams, Pakistan play an important role in living water supply and irrigation of farmlands; thus, the water quality is closely related to public health. Multivariate techniques were applied to check spatial and seasonal trends, and metals contamination sources of the Himalayan foothills streams, Pakistan. Grab surface water samples were collected from different sites (5-15 cm water depth) in pre-washed polyethylene containers. Fast Sequential Atomic Absorption Spectrophotometer (Varian FSAA-240) was used to measure the metals concentration. Concentrations of Ni, Cu, and Mn were high in pre-monsoon season than the post-monsoon season. Cluster analysis identified impaired, moderately impaired and least impaired clusters based on water parameters. Discriminant function analysis indicated spatial variability in water was due to temperature, electrical conductivity, nitrates, iron and lead whereas seasonal variations were correlated with 16 physicochemical parameters. Factor analysis identified municipal and poultry waste, automobile activities, surface runoff, and soil weathering as major sources of contamination. Levels of Mn, Cr, Fe, Pb, Cd, Zn and alkalinity were above the WHO and USEPA standards for surface water. The results of present study will help to higher authorities for the management of the Himalayan foothills streams.
Zhou, Ran; Peng, Shi-Tao; Qin, Xue-Bo; Shi, Hong-Hua; Ding, De-Wen
2013-03-01
A detailed field survey of hydrological, chemical and biological resources was conducted in the Bohai Bay in spring and summer 2007. The distributions of phytoplankton and their relations to environmental factors were investigated with multivariate analysis techniques. Totally 17 and 23 taxa were identified in spring and summer, respectively. The abundance of phytoplankton in spring was 115 x 10(4) cells x m(-3), which was significantly higher than that in summer (3.1 x 10(4) cells x m(-3)). Characteristics of phytoplankton assemblages in the two seasons were identified using principal component analysis (PCA), while redundancy analysis (RDA) was used to examine the environmental variables that may explain the patterns of variation of the phytoplankton community. Based on PCA results, in the spring, the phytoplankton was mainly distributed in the center and northern water zone, where the nitrate nitrogen concentration was higher. However, in summer, phytoplankton was found distributed in all zones of Bohai Bay, while the dominant species was mainly distributed in the estuary. RDA indicated that the key environmental factors that influenced phytoplankton assemblages in the spring were nitrate nitrogen (NO3(-) -N), nitrite nitrogen (NO2(-) -N) and soluble reactive phosphorus (SRP), while ammonium nitrogen (NH4(+) -N) and water temperature (WT) played key roles in summer.
Multivariate-Statistical Assessment of Heavy Metals for Agricultural Soils in Northern China
Directory of Open Access Journals (Sweden)
Pingguo Yang
2014-01-01
Full Text Available The study evaluated eight heavy metals content and soil pollution from agricultural soils in northern China. Multivariate and geostatistical analysis approaches were used to determine the anthropogenic and natural contribution of soil heavy metal concentrations. Single pollution index and integrated pollution index could be used to evaluate soil heavy metal risk. The results show that the first factor explains 27.3% of the eight soil heavy metals with strong positive loadings on Cu, Zn, and Cd, which indicates that Cu, Zn, and Cd are associated with and controlled by anthropic activities. The average value of heavy metal is lower than the second grade standard values of soil environmental quality standards in China. Single pollution index is lower than 1, and the Nemerow integrated pollution index is 0.305, which means that study area has not been polluted. The semivariograms of soil heavy metal single pollution index fitted spherical and exponential models. The variable ratio of single pollution index showed moderately spatial dependence. Heavy metal contents showed relative safety in the study area.
Khound, Nayan J.; Bhattacharyya, Krishna G.
2017-09-01
The aim of this study was to assess the quality of surfacewater sources in the Jia Bharali river basin and adjoining areas of the Himalayan foothills with respect to heavy elements viz. (As, Cd, Cr, Cu, Fe, Mn, Ni, Pb and Zn) by hydrochemical and multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA). This study presents the first ever systematic analysis on toxic elements of water samples collected from 35 different surface water sources in both the dry and wet seasons for a duration of 2 hydrological years (2009-2011). Varimax factors extracted by principal component analysis indicates anthropogenic (domestic and agricultural run-off) and geogenic influences on the trace elements. Hierarchical cluster analysis grouped 35 surfacewater sources into three statistically significant clusters based on the similarity of water quality characteristics. This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective surfacewater quality management.
DEFF Research Database (Denmark)
Johansen, Søren
2008-01-01
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating e...... eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. We briefly mention asymptotic results......The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...
Mfumu Kihumba, Antoine; Vanclooster, Marnik
2013-04-01
Drinking water in Kinshasa, the capital of the Democratic Republic of Congo, is provided by extracting groundwater from the local aquifer, particularly in peripheral areas. The exploited groundwater body is mainly unconfined and located within a continuous detrital aquifer, primarily composed of sedimentary formations. However, the aquifer is subjected to an increasing threat of anthropogenic pollution pressure. Understanding the detailed origin of this pollution pressure is important for sustainable drinking water management in Kinshasa. The present study aims to explain the observed nitrate pollution problem, nitrate being considered as a good tracer for other pollution threats. The analysis is made in terms of physical attributes that are readily available using a statistical modelling approach. For the nitrate data, use was made of a historical groundwater quality assessment study, for which the data were re-analysed. The physical attributes are related to the topography, land use, geology and hydrogeology of the region. Prior to the statistical modelling, intrinsic and specific vulnerability for nitrate pollution was assessed. This vulnerability assessment showed that the alluvium area in the northern part of the region is the most vulnerable area. This area consists of urban land use with poor sanitation. Re-analysis of the nitrate pollution data demonstrated that the spatial variability of nitrate concentrations in the groundwater body is high, and coherent with the fragmented land use of the region and the intrinsic and specific vulnerability maps. For the statistical modeling use was made of multiple regression and regression tree analysis. The results demonstrated the significant impact of land use variables on the Kinshasa groundwater nitrate pollution and the need for a detailed delineation of groundwater capture zones around the monitoring stations. Key words: Groundwater , Isotopic, Kinshasa, Modelling, Pollution, Physico-chemical.
Apsens, S. J.; Norcross, B.
2016-02-01
Eelpouts of the genus Lycodes are a group of demersal fish commonly found in the U.S. Beaufort Sea. They are relatively numerous, having composed a significant proportion of fish catches during trawl surveys, and they are consumed by marine mammals and birds. Eelpouts themselves may also potentially feed on, or compete for resources with, other fish species. Currently, however, their exact role in the Arctic food web is still poorly understood. Percent number (%N) and percent weight (%W) were used to describe diet, and multivariate techniques were used to look for patterns across environmental (depth and longitude) and biological (length) gradients. Fish were collected in August and September of 2012 and 2014 as part of the U.S.-Canada Transboundary cruises. Stomachs from four eelpout species were examined: Adolf's Eelpout Lycodes adolfi, Canadian Eelpout L. polaris, Archers Eelpout L. sagittarius, and Longear Eelpout L. seminudus. Polychaetes, benthic amphipods, brittle stars, and harpacticoid copepods composed a large part of the observed diet for all four Lycodes species, but proportions differed by species. Intraspecific similarity was low, suggesting these fish have diverse diets even among individuals of the same species. Fish total length was found to be correlated with diet composition for all fish species examined except L. seminudus. Longitude and depth were found to be correlated with diet for L. sagittarius. Identifying prey and factors influencing diet are initial steps towards characterizing the ecological role of this fish genus in the U.S. Arctic food web. Ecological information on abundant fish species is needed for the development of ecosystem-based management practices. Establishing a benchmark for this group is important for understanding their current and future role in the rapidly changing Arctic food web.
Dhat, Shalaka; Pund, Swati; Kokare, Chandrakant; Sharma, Pankaj; Shrivastava, Birendra
2017-01-01
Rapidly evolving technical and regulatory landscapes of the pharmaceutical product development necessitates risk management with application of multivariate analysis using Process Analytical Technology (PAT) and Quality by Design (QbD). Poorly soluble, high dose drug, Satranidazole was optimally nanoprecipitated (SAT-NP) employing principles of Formulation by Design (FbD). The potential risk factors influencing the critical quality attributes (CQA) of SAT-NP were identified using Ishikawa diagram. Plackett-Burman screening design was adopted to screen the eight critical formulation and process parameters influencing the mean particle size, zeta potential and dissolution efficiency at 30min in pH7.4 dissolution medium. Pareto charts (individual and cumulative) revealed three most critical factors influencing CQA of SAT-NP viz. aqueous stabilizer (Polyvinyl alcohol), release modifier (Eudragit® S 100) and volume of aqueous phase. The levels of these three critical formulation attributes were optimized by FbD within established design space to minimize mean particle size, poly dispersity index, and maximize encapsulation efficiency of SAT-NP. Lenth's and Bayesian analysis along with mathematical modeling of results allowed identification and quantification of critical formulation attributes significantly active on the selected CQAs. The optimized SAT-NP exhibited mean particle size; 216nm, polydispersity index; 0.250, zeta potential; -3.75mV and encapsulation efficiency; 78.3%. The product was lyophilized using mannitol to form readily redispersible powder. X-ray diffraction analysis confirmed the conversion of crystalline SAT to amorphous form. In vitro release of SAT-NP in gradually pH changing media showed 95%) in pH7.4 in next 3h, indicative of burst release after a lag time. This investigation demonstrated effective application of risk management and QbD tools in developing site-specific release SAT-NP by nanoprecipitation. Copyright © 2016 Elsevier B.V. All
Estimating Peak Outflow of Earth Fill Dam Failures by Multivariable Statistical Models
Directory of Open Access Journals (Sweden)
mahsa noori
2016-02-01
Full Text Available Introduction: Dam failure and its flooding is one of the destructive phenomena today. Therefore, estimating the peak outflow (QP with reasonable accuracy and determining the related flood zone can reduce risks. Qp of dam failure depends on important factors such as: depth above breach (Hw, volume of water above breach bottom at failure (Vw, reservoir surface area (A, storage (S and dam height (Hd. Various researchers have proposed equations to estimate QP. They used the regression method to obtain an appropriate equation. Regression is a mathematical technique that requires initial test and diagnosis. These researchers present a new regression model for a better estimation of Qp. Materials and Methods: The data used in this study are related to 140 broken dams in the world for 34 of which sufficient data are available for analysis. Dam failure phenomenon is a rapidly varied unsteady flow that is explained by shallow waters equations. The equations in the one-dimensional form are known as Saint-Venant equations and are based on hydrostatic pressure distribution and uniform flow under rectangular steep assumption. Although hydraulic methods to predict the dam failure flood have been developed by different software, due to the complex nature of the problem and the impossibility of considering all parameters in hydraulic analysis, statistical methods have been developed in this field. Statistical methods determine the equations that can approximate the required factors from the observed parameters. Multiple regression is a useful technique to model effective parameters in Qp, which can examine the statistical aspects of the model. This work is done by different tests, such as the model coefficients necessity test, analysis of variance table and it creates confidence intervals. Data analysis in this paper is done by SPSS 16 software. This software can provide fit model, various characteristics and related tests in the Tables. Results and Discussion
Energy Technology Data Exchange (ETDEWEB)
None, None
2012-12-31
This report evaluates the chemistry of seep water occurring in three desert drainages near Shiprock, New Mexico: Many Devils Wash, Salt Creek Wash, and Eagle Nest Arroyo. Through the use of geochemical plotting tools and multivariate statistical analysis techniques, analytical results of samples collected from the three drainages are compared with the groundwater chemistry at a former uranium mill in the Shiprock area (the Shiprock site), managed by the U.S. Department of Energy Office of Legacy Management. The objective of this study was to determine, based on the water chemistry of the samples, if statistically significant patterns or groupings are apparent between the sample populations and, if so, whether there are any reasonable explanations for those groupings.
Dai, H.; Thakur, J. S.; Serhatkulu, G. K.; Pandya, A. K.; Auner, G. W.; Naik, R.; Freeman, D. C.; Naik, V. M.; Cao, A.; Klein, M. D.; Rabah, R.
2006-03-01
Raman spectra ( > 680) of normal mammary gland, malignant mammary gland tumors, and lymph node tissues from mice injected with 4T1 tumor cells have been recorded using 785 nm excitation laser. The state of the tissues was confirmed by standard pathological tests. The multivariate statistical analysis methods (principle component analysis and discriminant functional analysis) have been used to categorize the Raman spectra. The statistical algorithms based on the Raman spectral peak heights, clearly separated tissues into six distinct classes, including mastitis, which is clearly separated from normal and tumor. This study suggests that the Raman spectroscopy can possibly perform a real-time analysis of the human mammary tissues for the detection of cancer.
Directory of Open Access Journals (Sweden)
Xiangyu Mu
2014-09-01
Full Text Available Natural factors and anthropogenic activities both contribute dissolved chemical loads to lakes and streams. Mineral solubility, geomorphology of the drainage basin, source strengths and climate all contribute to concentrations and their variability. Urbanization and agriculture waste-water particularly lead to aquatic environmental degradation. Major contaminant sources and controls on water quality can be asssessed by analyzing the variability in proportions of major and minor solutes in water coupled to mutivariate statistical methods. The demand for freshwater needed for increasing crop production puulation and industrialization occurs almost everywhere in in China and these conflicting needs have led to widespread water contamination. Because of heavy nutrient loadings from all of these sources, Lake Taihu (eastern China notably suffers periodic hyper-eutrophication and drinking water deterioration, which has led to shortages of freshwater for the City of Wuxi and other nearby cities. This lake, the third largest freshwater body in China, has historically beeen considered a cultural treasure of China, and has supported long-term fisheries. The is increasing pressure to remediate the present contamination which compromises both aquiculture and the prior economic base centered on tourism. However, remediation cannot be effectively done without first characterizing the broad nature of the non-point source pollution. To this end, we investigated the hydrochemical setting of Lake Taihu to determine how different land use types influence the variability of surface water chemistry in different water sources to the lake. We found that waters broadly show wide variability ranging from calcium-magnesium-bicarbonate hydrochemical facies type to mixed sodium-sulfate-chloride type. Principal components analysis produced three principal components that explained 78% of the variance in the water quality and reflect three major types of water
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Paixão, Paulo; Gouveia, Luís F; Silva, Nuno; Morais, José A G
2017-03-01
A simulation study is presented, evaluating the performance of the f2, the model-independent multivariate statistical distance and the f2 bootstrap methods in the ability to conclude similarity between two dissolution profiles. Different dissolution profiles, based on the Noyes-Whitney equation and ranging from theoretical f2 values between 100 and 40, were simulated. Variability was introduced in the dissolution model parameters in an increasing order, ranging from a situation complying with the European guidelines requirements for the use of the f2 metric to several situations where the f2 metric could not be used anymore. Results have shown that the f2 is an acceptable metric when used according to the regulatory requirements, but loses its applicability when variability increases. The multivariate statistical distance presented contradictory results in several of the simulation scenarios, which makes it an unreliable metric for dissolution profile comparisons. The bootstrap f2, although conservative in its conclusions is an alternative suitable method. Overall, as variability increases, all of the discussed methods reveal problems that can only be solved by increasing the number of dosage form units used in the comparison, which is usually not practical or feasible. Additionally, experimental corrective measures may be undertaken in order to reduce the overall variability, particularly when it is shown that it is mainly due to the dissolution assessment instead of being intrinsic to the dosage form. Copyright © 2016. Published by Elsevier B.V.
Liu, Jie; Wang, Weixin; Yang, Yaojun; Yan, Yuning; Wang, Wenyi; Wu, Haozhong; Ren, Zihe
2014-10-27
Radix Angelicae Sinensis, known as Danggui in China, is an effective and wide applied material in Traditional Chinese Medicine (TCM) and it is used in more than 80 composite formulae. Danggui from Minxian County, Gansu Province is the best in quality. To rapidly and nondestructively discriminate Danggui from the authentic region of origin from that from an unauthentic region, an electronic nose coupled with multivariate statistical analyses was developed. Two different feature extraction methods were used to ensure the authentic region and unauthentic region of Danggui origin could be discriminated. One feature extraction method is to capture the average value of the maximum response of the electronic nose sensors (feature extraction method 1). The other one is to combine the maximum response of the sensors with their inter-ratios (feature extraction method 2). Multivariate statistical analyses, including principal component analysis (PCA), soft independent modeling of class analogy (SIMCA), and hierarchical clustering analysis (HCA) were employed. Nineteen samples were analyzed by PCA, SIMCA and HCA. Then the remaining samples (GZM1, SH) were projected onto the SIMCA model to validate the models. The results indicated that, in the use of feature extraction method 2, Danggui from Yunnan Province and Danggui from Gansu Province could be successfully discriminated using the electronic nose coupled with PCA, SIMCA and HCA, which suggested that the electronic-nose system could be used as a simple and rapid technique for the discrimination of Danggui between authentic and unauthentic region of origin.
Directory of Open Access Journals (Sweden)
Chen-Lin Soo
2017-01-01
Full Text Available The study on Sarawak coastal water quality is scarce, not to mention the application of the multivariate statistical approach to investigate the spatial variation of water quality and to identify the pollution source in Sarawak coastal water. Hence, the present study aimed to evaluate the spatial variation of water quality along the coastline of the southwestern region of Sarawak using multivariate statistical techniques. Seventeen physicochemical parameters were measured at 11 stations along the coastline with approximately 225 km length. The coastal water quality showed spatial heterogeneity where the cluster analysis grouped the 11 stations into four different clusters. Deterioration in coastal water quality has been observed in different regions of Sarawak corresponding to land use patterns in the region. Nevertheless, nitrate-nitrogen exceeded the guideline value at all sampling stations along the coastline. The principal component analysis (PCA has determined a reduced number of five principal components that explained 89.0% of the data set variance. The first PC indicated that the nutrients were the dominant polluting factors, which is attributed to the domestic, agricultural, and aquaculture activities, followed by the suspended solids in the second PC which are related to the logging activities.
Reidy, Lorlyn; Bu, Kaixuan; Godfrey, Murrell; Cizdziel, James V
2013-12-10
Students in an instrumental analysis course with a forensic emphasis were presented with a mock scenario in which soil was collected from a murder suspect's car mat, from the crime scene, from adjacent areas, and from more distant locations. Students were then asked to conduct a comparative analysis using the soil's elemental distribution fingerprints. The soil was collected from Lafayette County, Mississippi, USA and categorized as sandy loam. Eight student groups determined twenty-two elements (Li, Be, Mg, Al, K, Ca, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Rb, Sr, Cs, Ba, Pb, U) in seven samples of soil and one sample of sediment by microwave-assisted acid digestion and inductively coupled plasma-mass spectrometry (ICP-MS). Data were combined and evaluated using multivariate statistical analyses. All eight student groups correctly classified their unknown among the different locations. Students learn, however, that whereas their results suggest that the elemental fingerprinting approach can be used to distinguish soils from different land-use areas and geographic locations, applying the methodology in forensic investigations is more complicated and has potential pitfalls. Overall, the inquiry-based pedagogy enthused the students and provided learning opportunities in analytical chemistry, including sample preparation, ICP-MS, figures-of-merit, and multivariate statistics. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Tien, Hai Minh; Le, Kien Anh; Le, Phung Thi Kim
2017-09-01
Bio hydrogen is a sustainable energy resource due to its potentially higher efficiency of conversion to usable power, high energy efficiency and non-polluting nature resource. In this work, the experiments have been carried out to indicate the possibility of generating bio hydrogen as well as identifying effective factors and the optimum conditions from cassava starch. Experimental design was used to investigate the effect of operating temperature (37-43 °C), pH (6-7), and inoculums ratio (6-10 %) to the yield hydrogen production, the COD reduction and the ratio of volume of hydrogen production to COD reduction. The statistical analysis of the experiment indicated that the significant effects for the fermentation yield were the main effect of temperature, pH and inoculums ratio. The interaction effects between them seem not significant. The central composite design showed that the polynomial regression models were in good agreement with the experimental results. This result will be applied to enhance the process of cassava starch processing wastewater treatment.
Le, Thi Thu Huyen; Zeunert, Stephanie; Lorenz, Malte; Meon, Günter
2017-05-01
A large complex water quality data set of a polluted river, the Tay Ninh River, was evaluated to identify its water quality problems, to assess spatial variation, to determine the main pollution sources, and to detect relationships between parameters. This river is highly polluted with organic substances, nutrients, and total iron. An important problem of the river is the inhibition of the nitrification. For the evaluation, different statistical techniques including cluster analysis (CA), discriminant analysis (DA), and principal component analysis (PCA) were applied. CA clustered 10 water quality stations into three groups corresponding to extreme, high, and moderate pollution. DA used only seven parameters to differentiate the defined clusters. The PCA resulted in four principal components. The first PC is related to conductivity, NH4-N, PO4-P, and TP and determines nutrient pollution. The second PC represents the organic pollution. The iron pollution is illustrated in the third PC having strong positive loadings for TSS and total Fe. The fourth PC explains the dependence of DO on the nitrate production. The nitrification inhibition was further investigated by PCA. The results showed a clear negative correlation between DO and NH4-N and a positive correlation between DO and NO3-N. The influence of pH on the NH4-N oxidation could not be detected by PCA because of the very low nitrification rate due to the constantly low pH of the river and because of the effect of wastewater discharge with very high NH4-N concentrations. The results are deepening the understanding of the governing water quality processes and hence to manage the river basins sustainably.
Energy Technology Data Exchange (ETDEWEB)
Park, Jinyong [Univ. of Arizona, Tucson, AZ (United States); Balasingham, P [Univ. of Arizona, Tucson, AZ (United States); McKenna, Sean Andrew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kulatilake, Pinnaduwa H.S.W. [Univ. of Arizona, Tucson, AZ (United States)
2004-09-01
Sandia National Laboratories, under contract to Nuclear Waste Management Organization of Japan (NUMO), is performing research on regional classification of given sites in Japan with respect to potential volcanic disruption using multivariate statistics and geo-statistical interpolation techniques. This report provides results obtained for hierarchical probabilistic regionalization of volcanism for the Sengan region in Japan by applying multivariate statistical techniques and geostatistical interpolation techniques on the geologic data provided by NUMO. A workshop report produced in September 2003 by Sandia National Laboratories (Arnold et al., 2003) on volcanism lists a set of most important geologic variables as well as some secondary information related to volcanism. Geologic data extracted for the Sengan region in Japan from the data provided by NUMO revealed that data are not available at the same locations for all the important geologic variables. In other words, the geologic variable vectors were found to be incomplete spatially. However, it is necessary to have complete geologic variable vectors to perform multivariate statistical analyses. As a first step towards constructing complete geologic variable vectors, the Universal Transverse Mercator (UTM) zone 54 projected coordinate system and a 1 km square regular grid system were selected. The data available for each geologic variable on a geographic coordinate system were transferred to the aforementioned grid system. Also the recorded data on volcanic activity for Sengan region were produced on the same grid system. Each geologic variable map was compared with the recorded volcanic activity map to determine the geologic variables that are most important for volcanism. In the regionalized classification procedure, this step is known as the variable selection step. The following variables were determined as most important for volcanism: geothermal gradient, groundwater temperature, heat discharge, groundwater
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a plant material that is of forensic interest due to the hallucinogenic nature of the active ingredient, salvinorin A. In this study, S. divinorum was extracted and spiked onto four different plant materials (S. divinorum, Salvia officinalis, Cannabis sativa, and Nicotiana tabacum) to simulate an adulterated sample that might be encountered in a forensic laboratory. The adulterated samples were extracted and analyzed by gas chromatography-mass spectrometry, and the resulting total ion chromatograms were subjected to a series of pretreatment procedures that were used to minimize non-chemical sources of variance in the data set. The data were then analyzed using principal components analysis (PCA) to investigate association of the adulterated extracts to unadulterated S. divinorum. While association was possible based on visual assessment of the PCA scores plot, additional procedures including Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores to provide a statistical evaluation of the association observed. The advantages and limitations of each statistical procedure in a forensic context were compared and are presented herein.
Energy Technology Data Exchange (ETDEWEB)
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Valdez, C. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sanner, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Martinez, H. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-11-28
Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduled precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.
Directory of Open Access Journals (Sweden)
T. Wahl
2012-02-01
Full Text Available This paper presents an advanced approach to statistically analyse storm surge events. In former studies the highest water level during a storm surge event usually was the only parameter that was used for the statistical assessment. This is not always sufficient, especially when statistically analysing storm surge scenarios for event-based risk analyses. Here, Archimedean Copula functions are applied and allow for the consideration of further important parameters in addition to the highest storm surge water levels. First, a bivariate model is presented and used to estimate exceedance probabilities of storm surges (for two tide gauges in the German Bight by jointly analysing the important storm surge parameters "highest turning point" and "intensity". Second, another dimension is added and a trivariate fully nested Archimedean Copula model is applied to additionally incorporate the significant wave height as an important wave parameter. With the presented methodology, reliable and realistic exceedance probabilities are derived and can be considered (among others for integrated flood risk analyses contributing to improve the overall results. It is highlighted that the concept of Copulas represents a promising alternative for facing multivariate problems in coastal engineering.
Directory of Open Access Journals (Sweden)
Mario Miguel Ojeda Ramírez
2017-01-01
Full Text Available Currently some teachers implement different methods in order to promote education linked to reality, to provide more effective training and a meaningful learning. Activemethods aim to increase motivation and create scenarios in which student participation is central to achieve a more meaningful learning. This paper reports on the implementation of a process of educational innovation in the course of Topics of Multivariate Statistics offered in the degree in Statistical Sciences and Techniques at the Universidad Veracruzana (Mexico. The strategies used as sets for data collection, design and project development and realization of individual and group presentations are described. Information and communication technologies (ICT used are: EMINUS, distributed education platform of the Universidad Veracruzana, and managing files with Dropbox, plus communication via WhatsApp. The R software was used for statistical analysis and for making presentations in academic forums. To explore students' perceptions depth interviews were conducted and indicators for evaluating the student satisfaction were defined; the results show positive evidence, concluding that students were satisfied with the way that the course was designed and implemented. They also stated that they feel able to apply what they have learned. The opinions put that using these strategies they were feeling in preparation for their professional life. Finally, some suggestions for improving the course in future editions are included.
Wallace, Jack; Champagne, Pascale; Hall, Geof
2016-06-01
The wastewater stabilization ponds (WSPs) at a wastewater treatment facility in eastern Ontario, Canada, have experienced excessive algae growth and high pH levels in the summer months. A full range of parameters were sampled from the system and the chemical dynamics in the three WSPs were assessed through multivariate statistical analysis. The study presents a novel approach for exploratory analysis of a comprehensive water chemistry dataset, incorporating principal components analysis (PCA) and principal components (PC) and partial least squares (PLS) regressions. The analyses showed strong correlations between chl-a and sunlight, temperature, organic matter, and nutrients, and weak and negative correlations between chl-a and pH and chl-a and DO. PCA reduced the data from 19 to 8 variables, with a good fit to the original data matrix (similarity measure of 0.73). Multivariate regressions to model system pH in terms of these key parameters were performed on the reduced variable set and the PCs generated, for which strong fits (R(2) > 0.79 with all data) were observed. The methodologies presented in this study are applicable to a wide range of natural and engineered systems where a large number of water chemistry parameters are monitored resulting in the generation of large data sets. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lifshits, A M
1979-01-01
General characteristics of the multivariate statistical analysis (MSA) is given. Methodical premises and criteria for the selection of an adequate MSA method applicable to pathoanatomic investigations of the epidemiology of multicausal diseases are presented. The experience of using MSA with computors and standard computing programs in studies of coronary arteries aterosclerosis on the materials of 2060 autopsies is described. The combined use of 4 MSA methods: sequential, correlational, regressional, and discriminant permitted to quantitate the contribution of each of the 8 examined risk factors in the development of aterosclerosis. The most important factors were found to be the age, arterial hypertension, and heredity. Occupational hypodynamia and increased fatness were more important in men, whereas diabetes melitus--in women. The registration of this combination of risk factors by MSA methods provides for more reliable prognosis of the likelihood of coronary heart disease with a fatal outcome than prognosis of the degree of coronary aterosclerosis.
Guo, Jing; Yuan, Yahong; Dou, Pei; Yue, Tianli
2017-10-01
Fifty-one kiwifruit juice samples of seven kiwifruit varieties from five regions in China were analyzed to determine their polyphenols contents and to trace fruit varieties and geographical origins by multivariate statistical analysis. Twenty-one polyphenols belonging to four compound classes were determined by ultra-high-performance liquid chromatography coupled with ultra-high-resolution TOF mass spectrometry. (-)-Epicatechin, (+)-catechin, procyanidin B1 and caffeic acid derivatives were the predominant phenolic compounds in the juices. Principal component analysis (PCA) allowed a clear separation of the juices according to kiwifruit varieties. Stepwise linear discriminant analysis (SLDA) yielded satisfactory categorization of samples, provided 100% success rate according to kiwifruit varieties and 92.2% success rate according to geographical origins. The result showed that polyphenolic profiles of kiwifruit juices contain enough information to trace fruit varieties and geographical origins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Brady, John J.; Farrell, Mikella E.; Pellegrino, Paul M.
2014-02-01
Multiplex coherent anti-Stokes Raman scattering (MCARS) is used to detect several chemical warfare simulants, such as dimethyl methylphosphonate and 2-chloroethyl ethyl sulfide, with high specificity. The spectral bandwidth of the femtosecond laser pulse used in these studies is sufficient to coherently and simultaneously drive all the vibrational modes in the molecule of interest. Evidence shows that MCARS is capable of overcoming common sensitivity limitations of spontaneous Raman scattering, thus allowing for the detection of the target material in milliseconds with standard, uncooled universal serial bus spectrometers as opposed to seconds with cooled, intensified CCD-based spectrometers. In addition, the obtained MCARS spectrum of the investigated sample provides multiple unique signatures. These signatures are used in an off-line multivariate statistical analysis allowing for the material's discrimination with high fidelity.
Mallamace, Domenico; Corsaro, Carmelo; Salvo, Andrea; Cicero, Nicola; Macaluso, Andrea; Giangrosso, Giuseppe; Ferrantelli, Vincenzo; Dugo, Giacomo
2014-05-01
We have studied by means of High Resolution Magic Angle Spinning Nuclear Magnetic Resonance the metabolic profile of the famous Sicilian cherry tomato of Pachino. Thanks to its organoleptic and healthy properties, this particular foodstuff was the first tomato accredited by the European PGI (Protected Geographical Indication) certification of quality. Due to the relatively high price of the final product commercial frauds originated in the Italian and international markets. Hence, there is a growing interest to develop analytical techniques able to predict the origin of a tomato sample, indicating whether or not it originates from the area of Pachino, Sicily (Italy). In this paper we have determined the molar concentration of the metabolites constituent the PGI cherry tomato of Pachino. Furthermore, by means of a multivariate statistical analysis we have identified which metabolites are relevant for sample differentiation.
Kamal, Ghulam Mustafa; Wang, Xiaohua; Bin Yuan; Wang, Jie; Sun, Peng; Zhang, Xu; Liu, Maili
2016-09-01
Soy sauce a well known seasoning all over the world, especially in Asia, is available in global market in a wide range of types based on its purpose and the processing methods. Its composition varies with respect to the fermentation processes and addition of additives, preservatives and flavor enhancers. A comprehensive (1)H NMR based study regarding the metabonomic variations of soy sauce to differentiate among different types of soy sauce available on the global market has been limited due to the complexity of the mixture. In present study, (13)C NMR spectroscopy coupled with multivariate statistical data analysis like principle component analysis (PCA), and orthogonal partial least square-discriminant analysis (OPLS-DA) was applied to investigate metabonomic variations among different types of soy sauce, namely super light, super dark, red cooking and mushroom soy sauce. The main additives in soy sauce like glutamate, sucrose and glucose were easily distinguished and quantified using (13)C NMR spectroscopy which were otherwise difficult to be assigned and quantified due to serious signal overlaps in (1)H NMR spectra. The significantly higher concentration of sucrose in dark, red cooking and mushroom flavored soy sauce can directly be linked to the addition of caramel in soy sauce. Similarly, significantly higher level of glutamate in super light as compared to super dark and mushroom flavored soy sauce may come from the addition of monosodium glutamate. The study highlights the potentiality of (13)C NMR based metabonomics coupled with multivariate statistical data analysis in differentiating between the types of soy sauce on the basis of level of additives, raw materials and fermentation procedures. Copyright © 2016 Elsevier B.V. All rights reserved.
Silva, A F; Sarraguça, M C; Fonteyne, M; Vercruysse, J; De Leersnyder, F; Vanhoorne, V; Bostijn, N; Verstraeten, M; Vervaet, C; Remon, J P; De Beer, T; Lopes, J A
2017-08-07
A multivariate statistical process control (MSPC) strategy was developed for the monitoring of the ConsiGma™-25 continuous tablet manufacturing line. Thirty-five logged variables encompassing three major units, being a twin screw high shear granulator, a fluid bed dryer and a product control unit, were used to monitor the process. The MSPC strategy was based on principal component analysis of data acquired under normal operating conditions using a series of four process runs. Runs with imposed disturbances in the dryer air flow and temperature, in the granulator barrel temperature, speed and liquid mass flow and in the powder dosing unit mass flow were utilized to evaluate the model's monitoring performance. The impact of the imposed deviations to the process continuity was also evaluated using Hotelling's T2 and Q residuals statistics control charts. The influence of the individual process variables was assessed by analyzing contribution plots at specific time points. Results show that the imposed disturbances were all detected in both control charts. Overall, the MSPC strategy was successfully developed and applied. Additionally, deviations not associated with the imposed changes were detected, mainly in the granulator barrel temperature control. Copyright © 2017 Elsevier B.V. All rights reserved.
Catelani, Tiago A; Santos, João Rodrigo; Páscoa, Ricardo N M J; Pezza, Leonardo; Pezza, Helena R; Lopes, João A
2018-03-01
This work proposes the use of near infrared (NIR) spectroscopy in diffuse reflectance mode and multivariate statistical process control (MSPC) based on principal component analysis (PCA) for real-time monitoring of the coffee roasting process. The main objective was the development of a MSPC methodology able to early detect disturbances to the roasting process resourcing to real-time acquisition of NIR spectra. A total of fifteen roasting batches were defined according to an experimental design to develop the MSPC models. This methodology was tested on a set of five batches where disturbances of different nature were imposed to simulate real faulty situations. Some of these batches were used to optimize the model while the remaining was used to test the methodology. A modelling strategy based on a time sliding window provided the best results in terms of distinguishing batches with and without disturbances, resourcing to typical MSPC charts: Hotelling's T 2 and squared predicted error statistics. A PCA model encompassing a time window of four minutes with three principal components was able to efficiently detect all disturbances assayed. NIR spectroscopy combined with the MSPC approach proved to be an adequate auxiliary tool for coffee roasters to detect faults in a conventional roasting process in real-time. Copyright © 2017 Elsevier B.V. All rights reserved.
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
North-East Sicily is strongly exposed to shallow landslide events. On October, 1st 2009 a severe rainstorm (225.5 mm of cumulative rainfall in 9 hours) caused flash floods and more than 1000 landslides, which struck several small villages as Giampilieri, Altolia, Molino, Pezzolo, Scaletta Zanclea, Itala, with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly consisting in earth and debris translational slides evolving into debris flows, triggered on steep slopes involving colluvium and regolith materials which cover the underlying metamorphic bedrock of Peloritani Mountains. In this area catchments are small (about 10 square kilometres), elongated, with steep slopes, low order streams, short time of concentration, and discharge directly into the sea. In the past, landslides occurred at Altolia in 1613 and 2000, at Molino in 1750, 1805 and 2000, at Giampilieri in 1791, 1918, 1929, 1932, 2000 and on October 25, 2007. The aim of this work is to define susceptibility models for shallow landslides using multivariate statistical analyses in the Giampilieri area (25 square kilometres). A detailed landslide inventory map has been produced, as the first step, through field surveys coupled with the observation of high resolution aerial colour orthophoto taken immediately after the event. 1,490 initiation zones have been identified; most of them have planimetric dimensions ranging between tens to few hundreds of square metres. The spatial hazard assessment has been focused on the detachment areas. Susceptibility models, performed in a GIS environment, took into account several parameters. The morphometric and hydrologic parameters has been derived from a detailed LiDAR 1×1 m. Square grid cells of 4×4 m were adopted as mapping units, on the basis of the area-frequency distribution of the detachment zones, and the optimal representation of the local morphometric conditions (e.g. slope angle, plan curvature). A
Litvinenko, Alexander
2017-12-10
Matrices began in the 2nd century BC with the Chinese. One can find traces, which go to the 4th century BC to the Babylonians. The text ``Nine Chapters of the Mathematical Art\\'\\' written during the Han Dynasty in China gave the first known example of matrix methods. They were used to solve simultaneous linear equations (more in http://math.nie.edu.sg/bwjyeo/it/MathsOnline_AM/livemath/the/IT3AMMatricesHistory.html). The first ideas of the maximum likelihood estimation (MLE) was introduces by Laplace (1749-1827), by Gauss (1777-1855), the Likelihood was defined by Thiele Thorvald (1838-1910). Why we still use matrices? The matrix data format is more than 2200 years old. Our world is multi-dimensional! Why not to introduce a more appropriate data format and why not to reformulate the MLE method for it? In this work we are utilizing the low-rank tensor formats for multi-dimansional functions, which appear in spatial statistics.
Directory of Open Access Journals (Sweden)
M. R. Guggenmos
2011-11-01
Full Text Available Identifying areas of interaction between groundwater and surface water is crucial for effective environmental management, because this interaction is known to influence water quantity and quality. This paper applies hydrochemistry and multivariate statistics to identify locations and mechanisms of groundwater-surface water interaction in the pastorally dominated Wairarapa Valley, New Zealand. Hierarchical Cluster Analysis (HCA and Principal Components Analysis (PCA were conducted using site-specific median values of Ca, Mg, Na, K, HCO_{3}, Cl, SO_{4} and electrical conductivity from 22 surface water sites and 246 groundwater sites. Surface water and groundwater monitoring sites were grouped together in three of the seven clusters identified by HCA, with the inference made that similarities in hydrochemistry indicate groundwater-surface water interaction. PCA indicated that the clusters were largely differentiated by total dissolved solids concentration, redox condition and ratio of major ions. Shallow aerobic groundwaters, located in close proximity to losing reaches of rivers, were grouped with similar Ca-HCO_{3} type surface waters, indicating potential recharge to aquifers from these river systems. Groundwaters that displayed a rainfall-recharged chemical signature with higher Na relative to Ca, higher Cl relative to HCO_{3} and an accumulation of NO_{3} were grouped with neighbouring surface waters, suggesting the provision of groundwater base flow to these river systems and the transfer of this chemical signature from underlying aquifers. The hydrochemical techniques used in this study did not reveal groundwater-surface water interaction in some parts of the study area, specifically where deep anoxic groundwaters, high in total dissolved solids with a distinct Na-Cl signature, showed no apparent link to surface water. The drivers of hydrochemistry inferred from HCA and PCA are consistent with previous
Bilgraer, Raphaël; Gillet, Sylvie; Gil, Sophie; Evain-Brion, Danièle; Laprévote, Olivier
2014-11-01
While acting upon chromatin compaction, histone post-translational modifications (PTMs) are involved in modulating gene expression through histone-DNA affinity and protein-protein interactions. These dynamic and environment-sensitive modifications are constitutive of the histone code that reflects the transient transcriptional state of the chromatin. Here we describe a global screening approach for revealing epigenetic disruption at the histone level. This original approach enables fast and reliable relative abundance comparison of histone PTMs and variants in human cells within a single LC-MS experiment. As a proof of concept, we exposed BeWo human choriocarcinoma cells to sodium butyrate (SB), a universal histone deacetylase (HDAC) inhibitor. Histone acid-extracts (n = 45) equally representing 3 distinct classes, Control, 1 mM and 2.5 mM SB, were analysed using ultra-performance liquid chromatography coupled with a hybrid quadrupole time-of-flight mass spectrometer (UPLC-QTOF-MS). Multivariate statistics allowed us to discriminate control from treated samples based on differences in their mass spectral profiles. Several acetylated and methylated forms of core histones emerged as markers of sodium butyrate treatment. Indeed, this untargeted histonomic approach could be a useful exploratory tool in many cases of xenobiotic exposure when histone code disruption is suspected.
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Freitas, Renato [Instituto Federal de Educacao, Ciencia e Tecnologia do Rio de Janeiro (CPAR/IFRJ), RJ (Brazil). Curso de Licenciatura em Matematica; Calza, Cristiane Ferreira; Lopes, Ricardo Tadeu [Coordenacao dos Programas de Pos-Graduacao de Engenharia (COPPE/UFRJ), RJ (Brazil); Rabello, Angela; Lima, Tania [Museu Nacional (MN/UFRJ), Rio de Janeiro, RJ (Brazil)
2011-07-01
Full text: In this work it was characterized the elemental composition of 102 fragments of Marajoara pubic covers, belonging to the National Museum collection, using EDXRF and multivariate statistics analysis. The objective was to identify possible groups of samples that presented similar characteristics. This information will be useful in the development of a systematic classification of these artifacts. Provenance studies of ancient ceramics are based on the assumption that pottery produced from a specific clay will present a similar chemical composition, which will distinguish them from pottery produced from a different clay. In this way, the pottery is assigned to particular production groups, which are then correlated with their respective origins. EDXRF measurements were carried out with a portable system, developed in the Nuclear Instrumentation Laboratory, consisting of an X-ray tube Oxford TF3005 with tungsten (W) anode, operating at 25 kV and 100 {mu}A, and a Si-PIN XR-100CR detector from Amptek. In each one of the 102 fragments, six points were analyzed (three in the front part and three in the reverse) with an acquisition time of 600 s and a beam collimation of 2 mm. The spectra were processed and analyzed using the software QXAS-AXIL from IAEA. PCA was applied to the XRF results revealing a clear cluster separation to the samples. (author)
Directory of Open Access Journals (Sweden)
Jiabo Chen
2016-10-01
Full Text Available Source apportionment of river water pollution is critical in water resource management and aquatic conservation. Comprehensive application of various GIS-based multivariate statistical methods was performed to analyze datasets (2009–2011 on water quality in the Liao River system (China. Cluster analysis (CA classified the 12 months of the year into three groups (May–October, February–April and November–January and the 66 sampling sites into three groups (groups A, B and C based on similarities in water quality characteristics. Discriminant analysis (DA determined that temperature, dissolved oxygen (DO, pH, chemical oxygen demand (CODMn, 5-day biochemical oxygen demand (BOD5, NH4+–N, total phosphorus (TP and volatile phenols were significant variables affecting temporal variations, with 81.2% correct assignments. Principal component analysis (PCA and positive matrix factorization (PMF identified eight potential pollution factors for each part of the data structure, explaining more than 61% of the total variance. Oxygen-consuming organics from cropland and woodland runoff were the main latent pollution factor for group A. For group B, the main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter. For group C, the evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics.
Directory of Open Access Journals (Sweden)
Chung-En Chung
2011-04-01
Full Text Available Concerns about the water quality in Yuan-Yang Lake (YYL, a shallow, subtropical alpine lake located in north-central Taiwan, has been rapidly increasing recently due to the natural and anthropogenic pollution. In order to understand the underlying physical and chemical processes as well as their associated spatial distribution in YYL, this study analyzes fourteen physico-chemical water quality parameters recorded at the eight sampling stations during 2008–2010 by using multivariate statistical techniques and a geostatistical method. Hierarchical clustering analysis (CA is first applied to distinguish the three general water quality patterns among the stations, followed by the use of principle component analysis (PCA and factor analysis (FA to extract and recognize the major underlying factors contributing to the variations among the water quality measures. The spatial distribution of the identified major contributing factors is obtained by using a kriging method. Results show that four principal components i.e., nitrogen nutrients, meteorological factor, turbidity and nitrate factors, account for 65.52% of the total variance among the water quality parameters. The spatial distribution of principal components further confirms that nitrogen sources constitute an important pollutant contribution in the YYL.
Directory of Open Access Journals (Sweden)
Xuedi Zhang
2014-07-01
Full Text Available This hydrogeological study assessed the quality of phreatic water supplies across the semi-arid, traditional agricultural region of the Yinchuan region in northwest China, near the upper reaches of the Yellow River. We analyzed the chemical characteristics of water collected from 39 sampling stations before the 2011 summer-autumn irrigation period, using multivariate statistical analysis and geostatistical methods. We determined which factors influence the composition of groundwater, using principal component analysis (PCA and two modes of cluster analysis. PCA showed that the most important variables in the study area were the strong evaporation effect caused by the dry climate, dissolution of carbonate minerals and those containing F− and K−, and human activity including the treatment of domestic sewage and chemical fertilization. The Q-mode of cluster analysis identified three distinct water types that were distinguished by different chemical compositions, while the R-mode of analysis revealed two distinct clusters of sampling stations that appeared to be influenced by distinct sets of natural and/or anthropogenic factors.
Okiongbo, K. S.; Douglas, R. K.
2015-03-01
To achieve a better understanding of the nature of the factors influencing groundwater composition as well as to specify them quantitatively, conventional graphical and multivariate statistical analysis (principal component analysis) were applied on hydrochemical data consisting of 51 groundwater samples collected from domestic boreholes in Yenagoa city, Bayelsa State, Nigeria. The mode of study includes analysis of major ion contents and other chemical parameters such as pH, total dissolved solids and electrical conductivity of the groundwater samples. The PCA yielded three principal components explaining 78.38 % of the total variance of the 11 parameters. The three components are interpreted as controlled by the natural weathering of existing silicate rocks, reverse ion-exchange processes and oxidation reactions which are further supported by the scatter diagrams, ionic signatures and mechanisms controlling the water chemistry diagrams as the common factors influencing the groundwater hydrogeochemical character. Limited anthropogenic influence on the groundwater composition has also been noticed in the study area. The groundwater poses no threat to human health because the concentrations of physico-chemical parameters that can be used to evaluate drinking water quality are within World Health Organisation standard specification. The groundwater in the area is fresh, high salinity and low sodium in nature.
Wu, Xia; Zheng, Kang; Zhao, Fengjia; Zheng, Yongjun; Li, Yantuan
2014-08-01
Meretricis concha is a kind of marine traditional Chinese medicine (TCM), and has been commonly used for the treatment of asthma and scald burns. In order to investigate the relationship between the inorganic elemental fingerprint and the geographical origin identification of Meretricis concha, the elemental contents of M. concha from five sampling points in Rushan Bay have been determined by means of inductively coupled plasma optical emission spectrometry (ICP-OES). Based on the contents of 14 inorganic elements (Al, As, Cd, Co, Cr, Cu, Fe, Hg, Mn, Mo, Ni, Pb, Se, and Zn), the inorganic elemental fingerprint which well reflects the elemental characteristics was constructed. All the data from the five sampling points were discriminated with accuracy through hierarchical cluster analysis (HCA) and principle component analysis (PCA), indicating that a four-factor model which could explain approximately 80% of the detection data was established, and the elements Al, As, Cd, Cu, Ni and Pb could be viewed as the characteristic elements. This investigation suggests that the inorganic elemental fingerprint combined with multivariate statistical analysis is a promising method for verifying the geographical origin of M. concha, and this strategy should be valuable for the authenticity discrimination of some marine TCM.
Kauer, Agnes; Dorigo, Wouter; Bauer-Marschallinger, Bernhard
2017-04-01
Global warming is expected to change ocean-atmosphere oscillation patterns, e.g. the El Nino Southern Oscillation, and may thus have a substantial impact on water resources over land. Yet, the link between climate oscillations and terrestrial hydrology has large uncertainties. In particular, the climate in the Mediterranean basin is expected to be sensitive to global warming as it may increase insufficient and irregular water supply and lead to more frequent and intense droughts and heavy precipitation events. The ever increasing need for water in tourism and agriculture reinforce the problem. Therefore, the monitoring and better understanding of the hydrological cycle are crucial for this area. This study seeks to quantify the effect of regional climate modes, e.g. the Northern Atlantic Oscillation (NAO) on the hydrological cycle in the Mediterranean. We apply Empirical Orthogonal Functions (EOF) to a wide range of hydrological datasets to extract the major modes of variation over the study period. We use more than ten datasets describing precipitation, soil moisture, evapotranspiration, and changes in water mass with study periods ranging from one to three decades depending on the dataset. The resulting EOFs are then examined for correlations with regional climate modes using Spearman rank correlation analysis. This is done for the entire time span of the EOFs and for monthly and seasonally sampled data. We find relationships between the hydrological datasets and the climate modes NAO, Arctic Oscillation (AO), Eastern Atlantic (EA), and Tropical Northern Atlantic (TNA). Analyses of monthly and seasonally sampled data reveal high correlations especially in the winter months. However, the spatial extent of the data cube considered for the analyses have a large impact on the results. Our statistical analyses suggest an impact of regional climate modes on the hydrological cycle in the Mediterranean area and may provide valuable input for evaluating process
National Research Council Canada - National Science Library
Herojeet, Rajkumar; Rishi, Madhuri S; Lata, Renu; Dolma, Konchok
2017-01-01
.... The present study envisages the application of multivariate analysis, water utility class and conventional graphical representation to reveal the hidden factor responsible for deterioration of water...
Directory of Open Access Journals (Sweden)
Weili Duan
2016-01-01
Full Text Available Multivariate statistical methods including cluster analysis (CA, discriminant analysis (DA and component analysis/factor analysis (PCA/FA, were applied to explore the surface water quality datasets including 14 parameters at 28 sites of the Eastern Poyang Lake Basin, Jiangxi Province of China, from January 2012 to April 2015, characterize spatiotemporal variation in pollution and identify potential pollution sources. The 28 sampling stations were divided into two periods (wet season and dry season and two regions (low pollution and high pollution, respectively, using hierarchical CA method. Four parameters (temperature, pH, ammonia-nitrogen (NH4-N, and total nitrogen (TN were identified using DA to distinguish temporal groups with close to 97.86% correct assignations. Again using DA, five parameters (pH, chemical oxygen demand (COD, TN, Fluoride (F, and Sulphide (S led to 93.75% correct assignations for distinguishing spatial groups. Five potential pollution sources including nutrients pollution, oxygen consuming organic pollution, fluorine chemical pollution, heavy metals pollution and natural pollution, were identified using PCA/FA techniques for both the low pollution region and the high pollution region. Heavy metals (Cuprum (Cu, chromium (Cr and Zinc (Zn, fluoride and sulfide are of particular concern in the study region because of many open-pit copper mines such as Dexing Copper Mine. Results obtained from this study offer a reasonable classification scheme for low-cost monitoring networks. The results also inform understanding of spatio-temporal variation in water quality as these topics relate to water resources management.
Keita, Souleymane; Zhonghua, Tang
2017-10-01
Sustainable management of groundwater resources is a major issue for developing countries, especially in Mali. The multiple uses of groundwater led countries to promote sound management policies for sustainable use of the groundwater resources. For this reason, each country needs data enabling it to monitor and predict the changes of the resources. Also given the importance of groundwater quality changes often marked by the recurrence of droughts; the potential impacts of regional and geological setting of groundwater resources requires careful study. Unfortunately, recent decades have seen a considerable reduction of national capacities to ensure the hydrogeological monitoring and production of qualit data for decision making. The purpose of this work is to use the groundwater data and translate into useful information that can improve water resources management capacity in Mali. In this paper, we used groundwater analytical data from accredited, laboratories in Mali to carry out a national scale assessment of the groundwater types and their distribution. We, adapted multivariate statistical methods to classify 2035 groundwater samples into seven main groundwater types and built a national scale map from the results. We used a two-level K-mean clustering technique to examine the hydro-geochemical records as percentages of the total concentrations of major ions, namely sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO3), and sulphate (SO4). The first step of clustering formed 20 groups, and these groups were then re-clustered to produce the final seven groundwater types. The results were verified and confirmed using Principal Component Analysis (PCA) and RockWare (Aq.QA) software. We found that HCO3 was the most dominant anion throughout the country and that Cl and SO4 were only important in some local zones. The dominant cations were Na and Mg. Also, major ion ratios changed with geographical location and geological, and climatic
Rakotondrabe, Felaniaina; Ndam Ngoupayou, Jules Remy; Mfonka, Zakari; Rasolomanana, Eddy Harilala; Nyangono Abolo, Alexis Jacob; Ako Ako, Andrew
2018-01-01
The influence of gold mining activities on the water quality in the Mari catchment in Bétaré-Oya (East Cameroon) was assessed in this study. Sampling was performed within the period of one hydrological year (2015 to 2016), with 22 sampling sites consisting of groundwater (06) and surface water (16). In addition to measuring the physicochemical parameters, such as pH, electrical conductivity, alkalinity, turbidity, suspended solids and CN-, eleven major elements (Na+, K+, Ca2+, Mg2+, NH4+, Cl-, NO3-, HCO3-, SO42-, PO43- and F-) and eight heavy metals (Pb, Zn, Cd, Fe, Cu, As, Mn and Cr) were also analyzed using conventional hydrochemical methods, Multivariate Statistical Analysis and the Heavy metal Pollution Index (HPI). The results showed that the water from Mari catchment and Lom River was acidic to basic (5.4050mg NO3-/L. This water was found as two main types: calcium magnesium bicarbonate (CaMg-HCO3), which was the most represented, and sodium bicarbonate potassium (NaK-HCO3). As for trace elements in surface water, the contents of Pb, Cd, Mn, Cr and Fe were higher than recommended by the WHO guidelines, and therefore, the surface water was unsuitable for human consumption. Three phenomena were responsible for controlling the quality of the water in the study area: hydrolysis of silicate minerals of plutono-metamorphic rocks, which constitute the geological basement of this area; vegetation and soil leaching; and mining activities. The high concentrations of TSS and trace elements found in this basin were mainly due to gold mining activities (exploration and exploitation) as well as digging of rivers beds, excavation and gold amalgamation. Copyright © 2017 Elsevier B.V. All rights reserved.
Heidema, A.G.; Thissen, U.; Boer, J.M.; Bouwman, F.G.; Feskens, E.J.M.; Mariman, E.C.
2009-01-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch
Affum, Andrews Obeng; Osae, Shiloh Dede; Nyarko, Benjamin Jabez Botwe; Afful, Samuel; Fianko, Joseph Richmond; Akiti, Tetteh Thomas; Adomako, Dickson; Acquaah, Samuel Osafo; Dorleku, Micheal; Antoh, Emmanuel; Barnes, Felix; Affum, Enoch Acheampong
2015-02-01
In recent times, surface water resource in the Western Region of Ghana has been found to be inadequate in supply and polluted by various anthropogenic activities. As a result of these problems, the demand for groundwater by the human populations in the peri-urban communities for domestic, municipal and irrigation purposes has increased without prior knowledge of its water quality. Water samples were collected from 14 public hand-dug wells during the rainy season in 2013 and investigated for total coliforms, Escherichia coli, mercury (Hg), arsenic (As), cadmium (Cd) and physicochemical parameters. Multivariate statistical analysis of the dataset and a linear stoichiometric plot of major ions were applied to group the water samples and to identify the main factors and sources of contamination. Hierarchal cluster analysis revealed four clusters from the hydrochemical variables (R-mode) and three clusters in the case of water samples (Q-mode) after z score standardization. Principal component analysis after a varimax rotation of the dataset indicated that the four factors extracted explained 93.3 % of the total variance, which highlighted salinity, toxic elements and hardness pollution as the dominant factors affecting groundwater quality. Cation exchange, mineral dissolution and silicate weathering influenced groundwater quality. The ranking order of major ions was Na(+) > Ca(2+) > K(+) > Mg(2+) and Cl(-) > SO4 (2-) > HCO3 (-). Based on piper plot and the hydrogeology of the study area, sodium chloride (86 %), sodium hydrogen carbonate and sodium carbonate (14 %) water types were identified. Although E. coli were absent in the water samples, 36 % of the wells contained total coliforms (Enterobacter species) which exceeded the WHO guidelines limit of zero colony-forming unit (CFU)/100 mL of drinking water. With the exception of Hg, the concentration of As and Cd in 79 and 43 % of the water samples exceeded the WHO guideline limits of 10 and 3
Masoud, Alaa A.
2014-07-01
Extensive urban, agricultural and industrial expansions on the western fringe of the Nile Delta of Egypt have exerted much load on the water needs and lead to groundwater quality deterioration. Documenting the spatial variation of the groundwater quality and their controlling factors is vital to ensure sustainable water management and safe use. A comprehensive dataset of 451 shallow groundwater samples were collected in 2011 and 2012. On-site field measurements of the total dissolved solids (TDS), electric conductivity (EC), pH, temperature, as well as lab-based ionic composition of the major and trace components were performed. Groundwater types were derived and the suitability for irrigation use was evaluated. Multivariate statistical techniques of factor analysis and K-means clustering were integrated with the geostatistical semi-variogram modeling for evaluating the spatial hydrochemical variations and the driving factors as well as for hydrochemical pattern recognition. Most hydrochemical parameters showed very wide ranges; TDS (201-24,400 mg/l), pH (6.72-8.65), Na+ (28.30-7774 mg/l), and Cl- (7-12,186 mg/l) suggesting complex hydrochemical processes of multiple sources. TDS violated the limit (1200 mg/l) of the Egyptian standards for drinking water quality in many localities. Extreme concentrations of Fe2+, Mn2+, Zn2+, Cu2+, Ni2+, are mostly related to their natural content in the water-bearing sediments and/or to contamination from industrial leakage. Very high nitrate concentrations exceeding the permissible limit (50 mg/l) were potentially maximized toward hydrologic discharge zones and related to wastewater leakage. Three main water types; NaCl (29%), Na2SO4 (26%), and NaHCO3 (20%), formed 75% of the groundwater dominated in the saline depressions, sloping sides of the coastal ridges of the depressions, and in the cultivated/newly reclaimed lands intensely covered by irrigation canals, respectively. Water suitability for irrigation use clarified that the
Energy Technology Data Exchange (ETDEWEB)
Chen, Hao [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Lu, Xinwei, E-mail: luxinwei@snnu.edu.cn [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Li, Loretta Y., E-mail: lli@civil.ubc.ca [Department of Civil Engineering, University of British Columbia, Vancouver V6T 1Z4 (Canada); Gao, Tianning; Chang, Yuyu [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China)
2014-06-01
The concentrations of As, Ba, Co, Cr, Cu, Mn, Ni, Pb, V and Zn in campus dust from kindergartens, elementary schools, middle schools and universities of Xi'an, China were determined by X-ray fluorescence spectrometry. Correlation coefficient analysis, principal component analysis (PCA) and cluster analysis (CA) were used to analyze the data and to identify possible sources of these metals in the dust. The spatial distributions of metals in urban dust of Xi'an were analyzed based on the metal concentrations in campus dusts using the geostatistics method. The results indicate that dust samples from campuses have elevated metal concentrations, especially for Pb, Zn, Co, Cu, Cr and Ba, with the mean values of 7.1, 5.6, 3.7, 2.9, 2.5 and 1.9 times the background values for Shaanxi soil, respectively. The enrichment factor results indicate that Mn, Ni, V, As and Ba in the campus dust were deficiently to minimally enriched, mainly affected by nature and partly by anthropogenic sources, while Co, Cr, Cu, Pb and Zn in the campus dust and especially Pb and Zn were mostly affected by human activities. As and Cu, Mn and Ni, Ba and V, and Pb and Zn had similar distribution patterns. The southwest high-tech industrial area and south commercial and residential areas have relatively high levels of most metals. Three main sources were identified based on correlation coefficient analysis, PCA, CA, as well as spatial distribution characteristics. As, Ni, Cu, Mn, Pb, Zn and Cr have mixed sources — nature, traffic, as well as fossil fuel combustion and weathering of materials. Ba and V are mainly derived from nature, but partly also from industrial emissions, as well as construction sources, while Co principally originates from construction. - Highlights: • Metal content in dust from schools was determined by XRF. • Spatial distribution of metals in urban dust was focused on campus samples. • Multivariate statistic and spatial distribution were used to identify metal
Leary, James F.; McLaughlin, Scott R.; Reece, Lisa M.; Rosenblatt, Judah I.; Hokanson, James A.
1999-06-01
Multivariate statistics can be used for visualization of cell subpopulations in multidimensional data space and for classification of cells within that data space. New data mining techniques we have developed, such as subtractive clustering, can be used to find the differences between test and control multiparameter flow cytometric data, e.g. in the problem of human stem cell isolation with tumor purging. They also can provide training data for subsequent multivariate statistical classification techniques such as discriminant function or logistic regression analyses. Using lookup tables, these multivariate statistical calculations can be performed in real-time, and can even include probabilities of misclassification. Thus, the only distinction between off-line classification of cells in data analysis and real-time statistical decision-making for cell sorting is the time limit in which a classification decision must be made. For real-time cell sorting we presently are able to perform these classifications in less than 625 microseconds, corresponding to the time that it takes the cell to travel from the laser intersection point to the sort decision point in a flow cytometer/cell sorter. Statistical decision making and the ability to include the costs of misclassification into that decision process will become important as flow cytometry/cell sorting moves from diagnostics to therapeutics.
Directory of Open Access Journals (Sweden)
Asim Ilyas
2016-01-01
Conclusion: Overall, the distribution, correlation and multivariate apportionment of selected metals in atherosclerosis patients and healthy donors are significantly divergent. Hence, present findings suggest that the trace and redox metals accumulated in the body may pose a high risk for atherosclerosis development.
DEFF Research Database (Denmark)
Birch, Thomas; Martinón-Torres, Marcos
2015-01-01
An assemblage of post-medieval iron bars was found with the Princes Channel wreck, salvaged from the Thames Estuary in 2003. They were recorded and studied, with a focus on metallography and slag inclusion analysis. The investigation provided an opportunity to explore the use of multivariate stat...
Fagir, Wael; Hathout, Rania M; Sammour, Omaima A; ElShafeey, Ahmed H
2015-11-01
To develop Finasteride-loaded self micro-emulsifying drug delivery systems (SMEDDS) for the treatment of hormonal associated problems. Ternary phase diagrams were constructed to obtain self-emulsification regions. Multivariate statistical methods viz. Principal component analysis and agglomerative hierarchy clustering analysis were used to evaluate the microemulsions stability. In vitro redispersibility study was adopted and two formulations were selected for spray-drying. Further investigations were performed (Fourier transform infrared, x-ray diffraction and transmission electron microscopy). Finally, the in vivo performance was tested in human volunteers. Multivariate statistical methods selected stable SMEDDS. Spray-drying utilizing maltodextrin/leucin carrier system yielded a flowable product. Selected solid SMEDDS scored 129.35% relative bioavailability compared with a commercial tablet. The developed SMEDDS poses successful platform for glucosteroid analogs oral delivery.
Energy Technology Data Exchange (ETDEWEB)
Heyen, H. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Gewaesserphysik
1998-12-31
A multivariate statistical approach is presented that allows a systematic search for relationships between the interannual variability in climate records and ecological time series. Statistical models are built between climatological predictor fields and the variables of interest. Relationships are sought on different temporal scales and for different seasons and time lags. The possibilities and limitations of this approach are discussed in four case studies dealing with salinity in the German Bight, abundance of zooplankton at Helgoland Roads, macrofauna communities off Norderney and the arrival of migratory birds on Helgoland. (orig.) [Deutsch] Ein statistisches, multivariates Modell wird vorgestellt, das eine systematische Suche nach potentiellen Zusammenhaengen zwischen Variabilitaet in Klima- und oekologischen Zeitserien erlaubt. Anhand von vier Anwendungsbeispielen wird der Klimaeinfluss auf den Salzgehalt in der Deutschen Bucht, Zooplankton vor Helgoland, Makrofauna vor Norderney, und die Ankunft von Zugvoegeln auf Helgoland untersucht. (orig.)
Directory of Open Access Journals (Sweden)
Cláudio Roberto Rosário
2012-07-01
Full Text Available The purpose of this research is to improve the practice on customer satisfaction analysis The article presents an analysis model to analyze the answers of a customer satisfaction evaluation in a systematic way with the aid of multivariate statistical techniques, specifically, exploratory analysis with PCA – Partial Components Analysis with HCA - Hierarchical Cluster Analysis. It was tried to evaluate the applicability of the model to be used by the issue company as a tool to assist itself on identifying the value chain perceived by the customer when applied the questionnaire of customer satisfaction. It was found with the assistance of multivariate statistical analysis that it was observed similar behavior among customers. It also allowed the company to conduct reviews on questions of the questionnaires, using analysis of the degree of correlation between the questions that was not a company’s practice before this research.
Vetrimurugan Elumalai; K. Brindha; Elango Lakshmanan
2017-01-01
Heavy metals in surface and groundwater were analysed and their sources were identified using multivariate statistical tools for two towns in South Africa. Human exposure risk through the drinking water pathway was also assessed. Electrical conductivity values showed that groundwater is desirable to permissible for drinking except for six locations. Concentration of aluminium, lead and nickel were above the permissible limit for drinking at all locations. Boron, cadmium, iron and manganese ex...
Souza, Iara da Costa; Morozesk, Mariana; Duarte, Ian Drumond; Bonomo, Marina Marques; Rocha, Lívia Dorsch; Furlan, Larissa Maria; Arrivabene, Hiulana Pereira; Monferrán, Magdalena Victoria; Matsumoto, Silvia Tamie; Milanez, Camilla Rozindo Dias; Wunderlin, Daniel Alberto; Fernandes, Marisa Narciso
2014-08-01
Roots of mangrove trees have an important role in depurating water and sediments by retaining metals that may accumulate in different plant tissues, affecting physiological processes and anatomy. The present study aimed to evaluate adaptive changes in root of Rhizophora mangle in response to different levels of chemical elements (metals/metalloids) in interstitial water and sediments from four neotropical mangroves in Brazil. What sets this study apart from other studies is that we not only investigate adaptive modifications in R. mangle but also changes in environments where this plant grows, evaluating correspondence between physical, chemical and biological issues by a combined set of multivariate statistical methods (pattern recognition). Thus, we looked to match changes in the environment with adaptations in plants. Multivariate statistics highlighted that the lignified periderm and the air gaps are directly related to the environmental contamination. Current results provide new evidences of root anatomical strategies to deal with contaminated environments. Multivariate statistics greatly contributes to extrapolate results from complex data matrixes obtained when analyzing environmental issues, pointing out parameters involved in environmental changes and also evidencing the adaptive response of the exposed biota. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Multivariate Solution of the Multivariate Ranking and Selection Problem
1980-02-01
theory in determining optimal test-treifnent snategies for streptococca.. sore throat and rheumatic fever . Operatiors Research, 24, 933-949. Gibbons, J. D...24, 92-103. Krischer, J. P. (1976). Utility structure cf a medical decision-making problem. Operations Research, 24, 951-971. Krishnaiah, P. R. and
Erdtman, Elias; Jönsson, Carl
2012-01-01
This master's thesis addresses numerical methods of computing the typical ranks of tensors over the real numbers and explores some properties of tensors over finite fields. We present three numerical methods to compute typical tensor rank. Two of these have already been published and can be used to calculate the lowest typical ranks of tensors and an approximate percentage of how many tensors have the lowest typical ranks (for some tensor formats), respectively. The third method was developed...
Gershenson, Carlos
Studies of rank distributions have been popular for decades, especially since the work of Zipf. For example, if we rank words of a given language by use frequency (most used word in English is 'the', rank 1; second most common word is 'of', rank 2), the distribution can be approximated roughly with a power law. The same applies for cities (most populated city in a country ranks first), earthquakes, metabolism, the Internet, and dozens of other phenomena. We recently proposed ``rank diversity'' to measure how ranks change in time, using the Google Books Ngram dataset. Studying six languages between 1800 and 2009, we found that the rank diversity curves of languages are universal, adjusted with a sigmoid on log-normal scale. We are studying several other datasets (sports, economies, social systems, urban systems, earthquakes, artificial life). Rank diversity seems to be universal, independently of the shape of the rank distribution. I will present our work in progress towards a general description of the features of rank change in time, along with simple models which reproduce it
Herojeet, Rajkumar; Rishi, Madhuri S.; Lata, Renu; Dolma, Konchok
2017-09-01
Sirsa River flows through the central part of the Nalagarh valley, belongs to the rapid industrial belt of Baddi, Barotiwala and Nalagarh (BBN). The appraisal of surface water quality to ascertain its utility in such ecologically sensitive areas is need of the hour. The present study envisages the application of multivariate analysis, water utility class and conventional graphical representation to reveal the hidden factor responsible for deterioration of water quality and determine the hydrochemical facies and its evolution processes of water types in Nalagarh valley, India. The quality assessment is made by estimating pH, electrical conductivity (EC), total dissolved solids (TDS), total hardness, major ions (Na+, K+, Ca2+, Mg2+, HCO3 -, Cl-, SO4 2-, NO3 - and PO4 3-), dissolved oxygen (DO), biological oxygen demand (BOD) and total coliform (TC) to determine its suitability for drinking and domestic purposes. The parameters like pH, TDS, TH, Ca2+, HCO3 -, Cl-, SO4 2-, NO3 - are within the desirable limit as per Bureau of Indian Standards (Indian Standard Drinking Water Specification (Second Edition) IS:10500. Indian Standard Institute, New Delhi, pp 1-18, 2012). Mg2+, Na+ and K+ ions for pre monsoon and EC during pre and post monsoon at few sites and approx 40% samples of BOD and TC for both seasons exceeds the permissible limits indicate organic contamination from human activities. Water quality classification for designated use indicates that maximum surface water samples are not suitable for drinking water source without conventional treatment. The result of piper trillinear and Chadha's diagram classified majority of surface water samples for both seasons fall in the fields of Ca2+-Mg2+-HCO3 - water type indicating temporary hardness. PCA and CA reveal that the surface water chemistry is influenced by natural factors such as weathering of minerals, ion exchange processes and anthropogenic factors. Thus, the present paper illustrates the importance of
Directory of Open Access Journals (Sweden)
Shashank Vyas
2016-01-01
Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Wang, Zaosheng; Wang, Yushao; Chen, Liuqin; Yan, Changzhou; Yan, Yijun; Chi, Qiaoqiao
2015-10-15
Total concentrations and chemical forms of heavy metals in surface sediments of Maluan Bay were determined and multiple geochemical indices and guidelines were applied to assess potential contamination and environmental risks. Metal concentrations exhibited significant spatial variation and the speciation of Cr was presented dominantly in the residual fraction, while Cd was found mostly in the non-residual fraction and thus of high potential bioavailability. Cluster analysis separated four subgroups of sampling sites with different levels of contamination. Further, a multivariate method offered the specific interpretation of possible contaminant sources and/or pathways. Factor scores characterized the sampling locations and elucidated the pollution status, pointing out the impact of multiple "hidden hotspots" of contaminants and providing further evidence of the existence of clear pollution-risk gradients in lagoon areas. The study supports the integrative approach as powerful tool to diagnose the pollution status scientifically for management decisions in coastal sediment of complex environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chang, Pao-Erh Paul; Yang, Jen-Chih Rena; Den, Walter; Wu, Chang-Fu
2014-09-01
Emissions of volatile organic compounds (VOCs) are most frequent environmental nuisance complaints in urban areas, especially where industrial districts are nearby. Unfortunately, identifying the responsible emission sources of VOCs is essentially a difficult task. In this study, we proposed a dynamic approach to gradually confine the location of potential VOC emission sources in an industrial complex, by combining multi-path open-path Fourier transform infrared spectrometry (OP-FTIR) measurement and the statistical method of principal component analysis (PCA). Close-cell FTIR was further used to verify the VOC emission source by measuring emitted VOCs from selected exhaust stacks at factories in the confined areas. Multiple open-path monitoring lines were deployed during a 3-month monitoring campaign in a complex industrial district. The emission patterns were identified and locations of emissions were confined by the wind data collected simultaneously. N,N-Dimethyl formamide (DMF), 2-butanone, toluene, and ethyl acetate with mean concentrations of 80.0 ± 1.8, 34.5 ± 0.8, 103.7 ± 2.8, and 26.6 ± 0.7 ppbv, respectively, were identified as the major VOC mixture at all times of the day around the receptor site. As the toxic air pollutant, the concentrations of DMF in air samples were found exceeding the ambient standard despite the path-average effect of OP-FTIR upon concentration levels. The PCA data identified three major emission sources, including PU coating, chemical packaging, and lithographic printing industries. Applying instrumental measurement and statistical modeling, this study has established a systematic approach for locating emission sources. Statistical modeling (PCA) plays an important role in reducing dimensionality of a large measured dataset and identifying underlying emission sources. Instrumental measurement, however, helps verify the outcomes of the statistical modeling. The field study has demonstrated the feasibility of
Energy Technology Data Exchange (ETDEWEB)
Aguado Garcia, D.; Ferrer Riquelme, A. J.; Seco Torrecillas, A.; Ferrer Polo, J.
2006-07-01
Due to the increasingly stringent effluents quality requirements imposed by the regulations, monitoring wastewater treatment plants (WWTP) becomes extremely important in order to achieve efficient process operations. Nowadays, at modern WWTP large number of online process variables are collected and these variable are usually highly correlated. Therefore, appropriate techniques are required to extract the information from the huge amount of collected data. In this work, the application of multivariate statistical projection techniques is presented as an effective strategy for monitoring a sequencing batch reactor (SBR) operated for enhanced biological phosphorus removal. (Author)
Liu, Ya-Juan; André, Silvère; Saint Cristau, Lydia; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Devos, Olivier; Duponchel, Ludovic
2017-02-01
Multivariate statistical process control (MSPC) is increasingly popular as the challenge provided by large multivariate datasets from analytical instruments such as Raman spectroscopy for the monitoring of complex cell cultures in the biopharmaceutical industry. However, Raman spectroscopy for in-line monitoring often produces unsynchronized data sets, resulting in time-varying batches. Moreover, unsynchronized data sets are common for cell culture monitoring because spectroscopic measurements are generally recorded in an alternate way, with more than one optical probe parallelly connecting to the same spectrometer. Synchronized batches are prerequisite for the application of multivariate analysis such as multi-way principal component analysis (MPCA) for the MSPC monitoring. Correlation optimized warping (COW) is a popular method for data alignment with satisfactory performance; however, it has never been applied to synchronize acquisition time of spectroscopic datasets in MSPC application before. In this paper we propose, for the first time, to use the method of COW to synchronize batches with varying durations analyzed with Raman spectroscopy. In a second step, we developed MPCA models at different time intervals based on the normal operation condition (NOC) batches synchronized by COW. New batches are finally projected considering the corresponding MPCA model. We monitored the evolution of the batches using two multivariate control charts based on Hotelling's T2 and Q. As illustrated with results, the MSPC model was able to identify abnormal operation condition including contaminated batches which is of prime importance in cell culture monitoring We proved that Raman-based MSPC monitoring can be used to diagnose batches deviating from the normal condition, with higher efficacy than traditional diagnosis, which would save time and money in the biopharmaceutical industry. Copyright © 2016 Elsevier B.V. All rights reserved.
El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz
2017-10-01
Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock-water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. Copyright © 2017 Elsevier Ltd. All rights reserved.
Longobardi, Francesco; Innamorato, Valentina; Di Gioia, Annalisa; Ventrella, Andrea; Lippolis, Vincenzo; Logrieco, Antonio F; Catucci, Lucia; Agostiano, Angela
2017-12-15
Lentil samples coming from two different countries, i.e. Italy and Canada, were analysed using untargeted 1H NMR fingerprinting in combination with chemometrics in order to build models able to classify them according to their geographical origin. For such aim, Soft Independent Modelling of Class Analogy (SIMCA), k-Nearest Neighbor (k-NN), Principal Component Analysis followed by Linear Discriminant Analysis (PCA-LDA) and Partial Least Squares-Discriminant Analysis (PLS-DA) were applied to the NMR data and the results were compared. The best combination of average recognition (100%) and cross-validation prediction abilities (96.7%) was obtained for the PCA-LDA. All the statistical models were validated both by using a test set and by carrying out a Monte Carlo Cross Validation: the obtained performances were found to be satisfying for all the models, with prediction abilities higher than 95% demonstrating the suitability of the developed methods. Finally, the metabolites that mostly contributed to the lentil discrimination were indicated. Copyright © 2017 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
A. Mrutu
2013-12-01
Full Text Available Mangrove wetlands are important biological systems that usually filter out organic and inorganic contaminants from the wastewaters before entering the ocean. Our previous work showed that sediments of the Msimbazi Creek wetland are contaminated with heavy metals and the amounts decreased with increasing depth. However, the hidden relationships between the heavy metals and clay particles were not fully understood based on the numerical data. Therefore this work used the data from literature and the Statistical Package for Social Sciences (SPSS software to study how significant the relationships are and predict the sources of heavy metals and clays. The results showed that Cd is the only metal that showed insignificant correlations with other heavy metals (with Pb and Zn while the rest of heavy metals exhibited significant positive correlation (except Pb vs. Ni. Cluster analysis classified the heavy metals based on the concentration and the first 50 cm cores (0-50 cm had higher heavy metals and % clay than the second 50 cm cores (51-100 cm. The results from the factor analysis suggests that Pb, Cd, Ni, and clay owe their source mostly from anthropogenic activities while Fe, Co, Cr, Zn and sand come from both anthropogenic and natural sources. These results support our previous suggestions that heavy metals and clays found in this wetland have mostly anthropogenic origin. However, we recommend isotopic tracing studies in order to accurately identify the origins of the heavy metals and clays in sediments of Msimbazi Creek mangrove wetland.
African Journals Online (AJOL)
maths/stats
INTRODUCTION. PageRank is Google's system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a ... Felix U. Ogban, Department of Mathematics/Statistics and Computer Science, Faculty of Science, University of ..... probability, 2004, 41, (3): 721-734.
Sequential rank agreement methods for comparison of ranked lists
DEFF Research Database (Denmark)
Ekstrøm, Claus Thorn; Gerds, Thomas Alexander; Jensen, Andreas Kryger
2015-01-01
The comparison of alternative rankings of a set of items is a general and prominent task in applied statistics. Predictor variables are ranked according to magnitude of association with an outcome, prediction models rank subjects according to the personalized risk of an event, and genetic studies...... are illustrated using gene rankings, and using data from two Danish ovarian cancer studies where we assess the within and between agreement of different statistical classification methods.......The comparison of alternative rankings of a set of items is a general and prominent task in applied statistics. Predictor variables are ranked according to magnitude of association with an outcome, prediction models rank subjects according to the personalized risk of an event, and genetic studies...
Guimarães Nobre, Gabriela; Arnbjerg-Nielsen, Karsten; Rosbjerg, Dan; Madsen, Henrik
2016-04-01
Traditionally, flood risk assessment studies have been carried out from a univariate frequency analysis perspective. However, statistical dependence between hydrological variables, such as extreme rainfall and extreme sea surge, is plausible to exist, since both variables to some extent are driven by common meteorological conditions. Aiming to overcome this limitation, multivariate statistical techniques has the potential to combine different sources of flooding in the investigation. The aim of this study was to apply a range of statistical methodologies for analyzing combined extreme hydrological variables that can lead to coastal and urban flooding. The study area is the Elwood Catchment, which is a highly urbanized catchment located in the city of Port Phillip, Melbourne, Australia. The first part of the investigation dealt with the marginal extreme value distributions. Two approaches to extract extreme value series were applied (Annual Maximum and Partial Duration Series), and different probability distribution functions were fit to the observed sample. Results obtained by using the Generalized Pareto distribution demonstrate the ability of the Pareto family to model the extreme events. Advancing into multivariate extreme value analysis, first an investigation regarding the asymptotic properties of extremal dependence was carried out. As a weak positive asymptotic dependence between the bivariate extreme pairs was found, the Conditional method proposed by Heffernan and Tawn (2004) was chosen. This approach is suitable to model bivariate extreme values, which are relatively unlikely to occur together. The results show that the probability of an extreme sea surge occurring during a one-hour intensity extreme precipitation event (or vice versa) can be twice as great as what would occur when assuming independent events. Therefore, presuming independence between these two variables would result in severe underestimation of the flooding risk in the study area.
Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos
2017-08-18
Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO3, Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na+, K+, Ca2+, Mg2+, SO42-, and Cl-), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO3-, SiO2, and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for
Directory of Open Access Journals (Sweden)
Fernando Velasco-Tapia
2014-01-01
Full Text Available Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC volcanic range (Mexican Volcanic Belt. In this locality, the volcanic activity (3.7 to 0.5 Ma was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward’s linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas in the comingled lavas (binary mixtures.
Multivariate analysis with LISREL
Jöreskog, Karl G; Y Wallentin, Fan
2016-01-01
This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.
Mihajilov-Krstev, Tatjana M; Denić, Marija S; Zlatković, Bojan K; Stankov-Jovanović, Vesna P; Mitić, Violeta D; Stojanović, Gordana S; Radulović, Niko S
2015-04-01
In Serbia, delicatessen fruit alcoholic drinks are produced from autochthonous fruit-bearing species such as cornelian cherry, blackberry, elderberry, wild strawberry, European wild apple, European blueberry and blackthorn fruits. There are no chemical data on many of these and herein we analysed volatile minor constituents of these rare fruit distillates. Our second goal was to determine possible chemical markers of these distillates through a statistical/multivariate treatment of the herein obtained and previously reported data. Detailed chemical analyses revealed a complex volatile profile of all studied fruit distillates with 371 identified compounds. A number of constituents were recognised as marker compounds for a particular distillate. Moreover, 33 of them represent newly detected flavour constituents in alcoholic beverages or, in general, in foodstuffs. With the aid of multivariate analyses, these volatile profiles were successfully exploited to infer the origin of raw materials used in the production of these spirits. It was also shown that all fruit distillates possessed weak antimicrobial properties. It seems that the aroma of these highly esteemed wild-fruit spirits depends on the subtle balance of various minor volatile compounds, whereby some of them are specific to a certain type of fruit distillate and enable their mutual distinction. © 2014 Society of Chemical Industry.
Robel, Martin; Kristo, Michael J
2008-11-01
The problem of identifying the provenance of unknown nuclear material in the environment by multivariate statistical analysis of its uranium and/or plutonium isotopic composition is considered. Such material can be introduced into the environment as a result of nuclear accidents, inadvertent processing losses, illegal dumping of waste, or deliberate trafficking in nuclear materials. Various combinations of reactor type and fuel composition were analyzed using Principal Components Analysis (PCA) and Partial Least Squares Discriminant Analysis (PLSDA) of the concentrations of nine U and Pu isotopes in fuel as a function of burnup. Real-world variation in the concentrations of (234)U and (236)U in the fresh (unirradiated) fuel was incorporated. The U and Pu were also analyzed separately, with results that suggest that, even after reprocessing or environmental fractionation, Pu isotopes can be used to determine both the source reactor type and the initial fuel composition with good discrimination.
Kumar, Manoj; Ramanathan, A L; Tripathi, Ritu; Farswan, Sandhya; Kumar, Devendra; Bhattacharya, Prosun
2017-01-01
This study is an investigation on spatio-chemical, contamination sources (using multivariate statistics), and health risk assessment arising from the consumption of groundwater contaminated with trace and toxic elements in the Chhaprola Industrial Area, Gautam Buddha Nagar, Uttar Pradesh, India. In this study 33 tubewell water samples were analyzed for 28 elements using ICP-OES. Concentration of some trace and toxic elements such as Al, As, B, Cd, Cr, Mn, Pb and U exceeded their corresponding WHO (2011) guidelines and BIS (2012) standards while the other analyzed elements remain below than those values. Background γ and β radiation levels were observed and found to be within their acceptable limits. Multivariate statistics PCA (explains 82.07 cumulative percent for total 6 of factors) and CA indicated (mixed origin) that natural and anthropogenic activities like industrial effluent and agricultural runoff are responsible for the degrading of groundwater quality in the research area. In this study area, an adult consumes 3.0 L (median value) of water therefore consuming 39, 1.94, 1461, 0.14, 11.1, 292.6, 13.6, 23.5 μg of Al, As, B, Cd, Cr, Mn, Pb and U from drinking water per day respectively. The hazard quotient (HQ) value exceeded the safe limit of 1 which for As, B, Al, Cr, Mn, Cd, Pb and U at few locations while hazard index (HI) > 5 was observed in about 30% of the samples which indicated potential health risk from these tubewells for the local population if the groundwater is consumed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mujica Ascencio, Saul; Choe, ChunSik; Meinke, Martina C; Müller, Rainer H; Maksimov, George V; Wigger-Alberti, Walter; Lademann, Juergen; Darvin, Maxim E
2016-07-01
Propylene glycol is one of the known substances added in cosmetic formulations as a penetration enhancer. Recently, nanocrystals have been employed also to increase the skin penetration of active components. Caffeine is a component with many applications and its penetration into the epidermis is controversially discussed in the literature. In the present study, the penetration ability of two components - caffeine nanocrystals and propylene glycol, applied topically on porcine ear skin in the form of a gel, was investigated ex vivo using two confocal Raman microscopes operated at different excitation wavelengths (785nm and 633nm). Several depth profiles were acquired in the fingerprint region and different spectral ranges, i.e., 526-600cm(-1) and 810-880cm(-1) were chosen for independent analysis of caffeine and propylene glycol penetration into the skin, respectively. Multivariate statistical methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) combined with Student's t-test were employed to calculate the maximum penetration depths of each substance (caffeine and propylene glycol). The results show that propylene glycol penetrates significantly deeper than caffeine (20.7-22.0μm versus 12.3-13.0μm) without any penetration enhancement effect on caffeine. The results confirm that different substances, even if applied onto the skin as a mixture, can penetrate differently. The penetration depths of caffeine and propylene glycol obtained using two different confocal Raman microscopes are comparable showing that both types of microscopes are well suited for such investigations and that multivariate statistical PCA-LDA methods combined with Student's t-test are very useful for analyzing the penetration of different substances into the skin. Copyright © 2016 Elsevier B.V. All rights reserved.
Matiatos, Ioannis
2016-01-15
Nitrate (NO3) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ(15)N-NO3 and δ(18)O-NO3) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO3 sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC).
Hamchevici, Carmen; Udrea, Ion
2013-11-01
The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.
Ogwueleka, Toochukwu Chibueze
2015-03-01
Multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis/factor analysis (PCA/FA), were used to investigate the temporal and spatial variations and to interpret large and complex water quality data sets collected from the Kaduna River. Kaduna River is the main tributary of Niger River in Nigeria and represents the common situation of most natural rivers including spatial patterns of pollutants. The water samples were collected monthly for 5 years (2008-2012) from eight sampling stations located along the river. In all samples, 17 parameters of water quality were determined: total dissolved solids (TDS), pH, Thard, dissolved oxygen (DO), 5-day biochemical oxygen demand (BOD5), chemical oxygen demand (COD), NH4-N, Cl, SO4, Ca, Mg, total coliform (TColi), turbidity, electrical conductivity (EC), HCO3 (-), NO3 (-), and temperature (T). Hierarchical CA grouped 12 months into two seasons (dry and wet seasons) and classified eight sampling stations into two groups (low- and high-pollution regions) based on seasonal differences and different levels of pollution, respectively. PCA/FA for each group formed by CA helped to identify spatiotemporal dynamics of water quality in Kaduna River. CA illustrated that water quality progressively deteriorated from headwater to downstream areas. The results of PCA/FA determined that 78.7 % of the total variance in low pollution region was explained by five factor, that is, natural and organic, mineral, microbial, organic, and nutrient, and 87.6 % of total variance in high pollution region was explained by six factors, that is, microbial, organic, mineral, natural, nutrient, and organic. Varifactors obtained from FA indicated that the parameters responsible for water quality variations are resulted from agricultural runoff, natural pollution, domestic, municipal, and industrial wastewater. Mann-Whitney U test results revealed that TDS, pH, DO, T, EC, TColi, turbidity, total hardness (THard), Mg
Generalized Reduced Rank Tests using the Singular Value Decomposition
Kleibergen, F.R.; Paap, R.
2006-01-01
We propose a novel statistic to test the rank of a matrix. The rank statistic overcomes deficiencies of existing rank statistics, like: a Kronecker covariance matrix for the canonical correlation rank statistic of Anderson [Annals of Mathematical Statistics (1951), 22, 327-351] sensitivity to the
Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi
2015-03-15
Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate
J Olive, David
2017-01-01
This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given. The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory. The robust techniques are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis. A simple way to bootstrap confidence regions is also provided. Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with...
Dragan, Oana; Tomuta, Ioan; Casoni, Dorina; Sarbu, Costel; Campian, Radu; Frentiu, Tiberiu
2017-11-16
This article reports for the first time the effects of multiple additives (polyethylene glycol 400, Triton X-100, benzalkonium chloride, and ethyl formate) on the surface tension, pH, and viscosity of 5.25% sodium hypochlorite (NaOCl) irrigant solution. Advanced statistical approaches based on unsupervised multivariate analysis (cluster analysis and principal component analysis) were used to quantify the variability of the physicochemical properties of the modified NaOCl solution for the first time in dentistry. Solutions of 5.25% NaOCl were modified with multiple additives in various concentrations, physicochemical parameters were measured at 22°C and 37°C, and the results were statistically analyzed to group the solutions and reveal the effects of additives. Cluster analysis and principal component analysis revealed that pH and surface tension were the significant parameters (P < .05) for grouping the modified solutions. Four principal components, accounting for 90.6% of the total variance, were associated with flow characteristics (37.3%) determined by polyethylene glycol; the wetting property (22.5% and 10.5%), which was dependent on cationic and nonionic surfactant; and the antimicrobial effect (20.3%) influenced by ethyl formate. Varimax rotation of the principal components showed that the cationic surfactant (benzalkonium chloride) had significantly decreased surface tension compared with the nonionic surfactant (Triton-X). Although ethyl formate was introduced as an odor modifier, it had a significant effect on pH decrease and the occurrence of effervescence with O2 and hypochlorous acid release. The statistical results revealed that the 5.25% NaOCl irrigant solution should be modified with a mixture of 0.1% benzalkonium chloride, 1% ethyl formate, and 7% polyethylene glycol for obtaining a low pH and low surface tension. Copyright © 2017 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Matiatos, Ioannis, E-mail: i.matiatos@iaea.org
2016-01-15
Nitrate (NO{sub 3}) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ{sup 15}N–NO{sub 3} and δ{sup 18}O–NO{sub 3}) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO{sub 3} sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC). - Highlights: • More enriched N-isotope values were observed in the industrial/urban areas. • A Bayesian isotope mixing model was applied in a multiple land-use area. • A 3-component model explained the factors controlling nitrate content in groundwater. • Industrial
Zhu, Guangxu; Guo, Qingjun; Xiao, Huayun; Chen, Tongbin; Yang, Jun
2017-06-01
Heavy metals are considered toxic to humans and ecosystems. In the present study, heavy metal concentration in soil was investigated using the single pollution index (PIi), the integrated Nemerow pollution index (PIN), and the geoaccumulation index (Igeo) to determine metal accumulation and its pollution status at the abandoned site of the Capital Iron and Steel Factory in Beijing and its surrounding area. Multivariate statistical (principal component analysis and correlation analysis), geostatistical analysis (ArcGIS tool), combined with stable Pb isotopic ratios, were applied to explore the characteristics of heavy metal pollution and the possible sources of pollutants. The results indicated that heavy metal elements show different degrees of accumulation in the study area, the observed trend of the enrichment factors, and the geoaccumulation index was Hg > Cd > Zn > Cr > Pb > Cu ≈ As > Ni. Hg, Cd, Zn, and Cr were the dominant elements that influenced soil quality in the study area. The Nemerow index method indicated that all of the heavy metals caused serious pollution except Ni. Multivariate statistical analysis indicated that Cd, Zn, Cu, and Pb show obvious correlation and have higher loads on the same principal component, suggesting that they had the same sources, which are related to industrial activities and vehicle emissions. The spatial distribution maps based on ordinary kriging showed that high concentrations of heavy metals were located in the local factory area and in the southeast-northwest part of the study region, corresponding with the predominant wind directions. Analyses of lead isotopes confirmed that Pb in the study soils is predominantly derived from three Pb sources: dust generated during steel production, coal combustion, and the natural background. Moreover, the ternary mixture model based on lead isotope analysis indicates that lead in the study soils originates mainly from anthropogenic sources, which contribute much more
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Blokland, M H; Van Tricht, E F; Van Rossum, H J; Sterk, S S; Nielen, M W F
2012-01-01
For years it has been suspected that natural hormones are illegally used as growth promoters in cattle in the European Union. Unfortunately there is a lack of methods and criteria that can be used to detect the abuse of natural hormones and distinguish treated from non-treated animals. Pattern recognition of steroid profiles is a promising approach for tracing/detecting the abuse of natural hormones administered to cattle. Traditionally steroids are analysed in urine as free steroid after deconjugation of the glucuronide (and sulphate) conjugates. The disadvantage of this deconjugation is that valuable information about the steroid profile in the sample is lost. In this study we develop a method to analyse steroids at very low concentration levels (ng l(-1)) for the free steroid, glucuronide and sulphate conjugates in urine samples. This method was used to determine concentrations of natural (pro)hormones in a large population (n = 620) of samples from male and female bovine animals and from bovine animals treated with testosterone-cypionate, estradiol-benzoate, dihydroepiandrosterone and pregnenolone. The data acquired were used to build a statistical model applying the multivariate technique 'Soft Independent Modeling of Class Analogy' (SIMCA). It is demonstrated that by using this model the results of the urine analysis can indicate which animal may have had illegal treatment with natural (pro)hormones.
Gogna, Navdeep; Hamid, Neda; Dorai, Kavita
2015-11-10
Extracts from the Carica papaya L. plant are widely reported to contain metabolites with antibacterial, antioxidant and anticancer activity. This study aims to analyze the metabolic profiles of papaya leaves and seeds in order to gain insights into their phytomedicinal constituents. We performed metabolite fingerprinting using 1D and 2D 1H NMR experiments and used multivariate statistical analysis to identify those plant parts that contain the most concentrations of metabolites of phytomedicinal value. Secondary metabolites such as phenyl propanoids, including flavonoids, were found in greater concentrations in the leaves as compared to the seeds. UPLC-ESI-MS verified the presence of significant metabolites in the papaya extracts suggested by the NMR analysis. Interestingly, the concentration of eleven secondary metabolites namely caffeic, cinnamic, chlorogenic, quinic, coumaric, vanillic, and protocatechuic acids, naringenin, hesperidin, rutin, and kaempferol, were higher in young as compared to old papaya leaves. The results of the NMR analysis were corroborated by estimating the total phenolic and flavonoid content of the extracts. Estimation of antioxidant activity in leaves and seed extracts by DPPH and ABTS in-vitro assays and antioxidant capacity in C2C12 cell line also showed that papaya extracts exhibit high antioxidant activity. Copyright © 2015 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Vetrimurugan Elumalai
2017-04-01
Full Text Available Heavy metals in surface and groundwater were analysed and their sources were identified using multivariate statistical tools for two towns in South Africa. Human exposure risk through the drinking water pathway was also assessed. Electrical conductivity values showed that groundwater is desirable to permissible for drinking except for six locations. Concentration of aluminium, lead and nickel were above the permissible limit for drinking at all locations. Boron, cadmium, iron and manganese exceeded the limit at few locations. Heavy metal pollution index based on ten heavy metals indicated that 85% of the area had good quality water, but 15% was unsuitable. Human exposure dose through the drinking water pathway indicated no risk due to boron, nickel and zinc, moderate risk due to cadmium and lithium and high risk due to silver, copper, manganese and lead. Hazard quotients were high in all sampling locations for humans of all age groups, indicating that groundwater is unsuitable for drinking purposes. Highly polluted areas were located near the coast, close to industrial operations and at a landfill site representing human-induced pollution. Factor analysis identified the four major pollution sources as: (1 industries; (2 mining and related activities; (3 mixed sources- geogenic and anthropogenic and (4 fertilizer application.
Krupa, S; Nosal, M; Ferdinand, J A; Stevenson, R E; Skelly, J M
2003-01-01
A multi-variate, non-linear statistical model is described to simulate passive O3 sampler data to mimic the hourly frequency distributions of continuous measurements using climatologic O3 indicators and passive sampler measurements. The main meteorological parameters identified by the model were, air temperature, relative humidity, solar radiation and wind speed, although other parameters were also considered. Together, air temperature, relative humidity and passive sampler data by themselves could explain 62.5-67.5% (R(2)) of the corresponding variability of the continuously measured O3 data. The final correlation coefficients (r) between the predicted hourly O3 concentrations from the passive sampler data and the true, continuous measurements were 0.819-0.854, with an accuracy of 92-94% for the predictive capability. With the addition of soil moisture data, the model can lead to the first order approximation of atmospheric O3 flux and plant stomatal uptake. Additionally, if such data are coupled to multi-point plant response measurements, meaningful cause-effect relationships can be derived in the future.
Energy Technology Data Exchange (ETDEWEB)
Bakraji, E.H., E-mail: cscientificl@aec.org.sy [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Rihawy, M.S. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Castel, C. [CNRS – Maison de l’Orient et de la Méditerranée, Laboratoire “Archéorient”, CNRS/Université Lumière-Lyon 2 (France); Abboud, R. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic)
2015-03-15
Highlights: •PIXE and OSL methods were used to classify and date pottery from Tell Al-Rawda site. •Three groups were classified using PIXE, which suggest different sources of the clay. •OSL was used for dating the site and the date found was consistent with typology. -- Abstract: Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year.
Directory of Open Access Journals (Sweden)
Pyvavar Iryna V.
2017-03-01
Full Text Available The article is aimed at a further development of the theoretical-methodological support together with elaboration of practical recommendations on the State regulation of housing and communal services in Ukraine. The stages of formation of mechanism of the State regulation in the sphere of housing and communal services have been considered, its main principles, functions, and models have been allocated; methods for reforming housing and communal services have been implemented through the use of the multivariate statistical analysis techniques (the integral rating evaluation and the cluster analysis. The groups of administrative territories with common problems were defined, using the integrated indicator of the level of development of housing and communal services in the individual regions each of them was evaluated and rated. The most effective levers of the State regulation have been determined. A matrix of the management strategy has been developed, thus allowing adaptation to the level of development in an individual region. Some basic strategies for management of housing and communal services have been proposed, following which the corresponding priority directions of regulation on the basis of assessment of the status of development of the sphere of housing and communal services can be elaborated in accordance with the relevant components and the existing potential.
Directory of Open Access Journals (Sweden)
M. SureshGandhi
2014-01-01
Full Text Available The distribution of natural gamma ray emitting 238U, 232Th and 40K radionuclides in beach sediments along north east coast of Tamilnadu, India has been carried out using a NaI(Tl gamma ray spectrometric technique. The total average concentrations of radionuclides 238U, 232Th, and 40K were 35.12, 713.16, and 349.60 Bq kg−1, respectively. Correlations made among these radionuclides prove the existence of secular equilibrium in the investigated sediments. The total average absorbed dose rate in the study areas is found to be 504.75 nGyh−1, whereas the annual effective dose rate has an average value of 0.62 mSvy−1. The mean activity concentrations of measured radionuclides were compared with other literature values. The ratios between the detected radioisotopes have been calculated for spatial distribution of natural radionuclides in studied area. Also the radiological hazard of the natural radionuclides content, radium equivalent activity, external hazard index of the sediment samples in the area under consideration were calculated. Multivariate Statistical analyses (Pearson Correlation, Cluster and Factor analysis were carried out between the parameters obtained from radioactivity to know the existing relations.
Xiao, Wei; Peng, Yude; Tan, Zhexu; Lv, Qiuyue; Chan, Chi-On; Yang, Jingyu; Chen, Sibao
2017-12-01
Pyrrosiae Folium (PF) is a commonly used Chinese herb medicine originating from three Pyrrosia species for the treatment of urinary infection and urolithiasis. According to Chinese medicine practice, different specie origins led to some variations in the therapeutic effects of PF. To ensure the safety and efficacy of PF in clinical practice, it is necessary to establish a reliable and integrative method to distinguish PF occurring from the three species. In the present paper, a HPLC-DAD method was developed and applied to simultaneously analyze five major compounds in PF. Afterwards, multivariate statistical analyses including principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were applied for specie discrimination and integrative quality evaluation based on quantitative data. The chemical determination and pattern recognition results of 35 batches of PF samples indicated that PF samples from three species showed different chemical profiles and could be discriminated clearly. In conclusion, the present method is rapid and reliable for the quality assessment and species discrimination of PF.
Directory of Open Access Journals (Sweden)
Armin Saed-Moucheshi
2013-01-01
Full Text Available Multivariate statistical techniques were used to compare the relationship between yield and its related traits under noninoculated and inoculated cultivars with mycorrhizal fungus (Glomus intraradices; each one consisted of three wheat cultivars and four water regimes. Results showed that, under inoculation conditions, spike weight per plant and total chlorophyll content of the flag leaf were the most important variables contributing to wheat grain yield variation, while, under noninoculated condition, in addition to two mentioned traits, grain weight per spike and leaf area were also important variables accounting for wheat grain yield variation. Therefore, spike weight per plant and chlorophyll content of flag leaf can be used as selection criteria in breeding programs for both inoculated and noninoculated wheat cultivars under different water regimes, and also grain weight per spike and leaf area can be considered for noninoculated condition. Furthermore, inoculation of wheat cultivars showed higher value in the most measured traits, and the results indicated that inoculation treatment could change the relationship among morphological traits of wheat cultivars under drought stress. Also, it seems that the results of stepwise regression as a selecting method together with principal component and factor analysis are stronger methods to be applied in breeding programs for screening important traits.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Wang, Jie; Liu, Guijian; Liu, Houqi; Lam, Paul K S
2017-04-01
A total of 211 water samples were collected from 53 key sampling points from 5-10th July 2013 at four different depths (0m, 2m, 4m, 8m) and at different sites in the Huaihe River, Anhui, China. These points monitored for 18 parameters (water temperature, pH, TN, TP, TOC, Cu, Pb, Zn, Ni, Co, Cr, Cd, Mn, B, Fe, Al, Mg, and Ba). The spatial variability, contamination sources and health risk of trace elements as well as the river water quality were investigated. Our results were compared with national (CSEPA) and international (WHO, USEPA) drinking water guidelines, revealing that Zn, Cd and Pb were the dominant pollutants in the water body. Application of different multivariate statistical approaches, including correlation matrix and factor/principal component analysis (FA/PCA), to assess the origins of the elements in the Huaihe River, identified three source types that accounted for 79.31% of the total variance. Anthropogenic activities were considered to contribute much of the Zn, Cd, Pb, Ni, Co, and Mn via industrial waste, coal combustion, and vehicle exhaust; Ba, B, Cr and Cu were controlled by mixed anthropogenic and natural sources, and Mg, Fe and Al had natural origins from weathered rocks and crustal materials. Cluster analysis (CA) was used to classify the 53 sample points into three groups of water pollution, high pollution, moderate pollution, and low pollution, reflecting influences from tributaries, power plants and vehicle exhaust, and agricultural activities, respectively. The results of the water quality index (WQI) indicate that water in the Huaihe River is heavily polluted by trace elements, so approximately 96% of the water in the Huaihe River is unsuitable for drinking. A health risk assessment using the hazard quotient and index (HQ/HI) recommended by the USEPA suggests that Co, Cd and Pb in the river could cause non-carcinogenic harm to human health. Copyright © 2017 Elsevier B.V. All rights reserved.
Campanya, J. L.; Ogaya, X.; Jones, A. G.; Rath, V.; McConnell, B.; Haughton, P.; Prada, M.
2016-12-01
The Science Foundation Ireland funded project IRECCSEM project (www.ireccsem.ie) aims to evaluate Ireland's potential for onshore carbon sequestration in saline aquifers by integrating new electromagnetic geophysical data with existing geophysical and geological data. One of the objectives of this component of IRECCSEM is to characterise the subsurface beneath the Loop Head Peninsula (part of Clare Basin, Co. Clare, Ireland), and identify major electrical resistivity structures that can guide an interpretation of the carbon sequestration potential of this area. During the summer of 2014, a magnetotelluric (MT) survey was carried out on the Loop Head Peninsula, and data from a total of 140 sites were acquired, including audio-magnetotelluric (AMT), and broadband magnetotelluric (BBMT). The dataset was used to generate shallow three-dimensional (3-D) electrical resistivity models constraining the subsurface to depths of up to 3.5 km. The three-dimensional (3-D) joint inversions were performed using three different types of electromagnetic data: MT impedance tensor (Z), geomagnetic transfer functions (T), and inter-station horizontal magnetic transfer-functions (H). The interpretation of the results was complemented with second-derivative models of the resulting electrical resistivity models, and a quantitative comparison with borehole data using multivariate statistical methods. Second-derivative models were used to define the main interfaces between the geoelectrical structures, facilitating superior comparison with geological and seismic results, and also reducing the influence of the colour scale when interpreting the results. Specific analysis was performed to compare the extant borehole data with the electrical resistivity model, identifying those structures that are better characterised by the resistivity model. Finally, the electrical resistivity model was also used to propagate some of the physical properties measured in the borehole, when a good relation was
Liu, Pu; Hoth, Nils; Drebenstedt, Carsten; Sun, Yajun; Xu, Zhimin
2017-12-01
Groundwater is an important drinking water resource that requires protection in North China. Coal mining industry in the area may influence the water quality evolution. To provide primary characterization of the hydrogeochemical processes and paths that control the water quality evolution, a complex multi-layer groundwater system in a coal mining area is investigated. Multivariate statistical methods involving hierarchical cluster analysis (HCA) and principal component analysis (PCA) are applied, 6 zones and 3 new principal components are classified as major reaction zones and reaction factors. By integrating HCA and PCA with hydrogeochemical correlations analysis, potential phases, reactions and connections between various zones are presented. Carbonates minerals, gypsum, clay minerals as well as atmosphere gases - CO 2 , H 2 O and NH 3 are recognized as major reactants. Mixtures, evaporation, dissolution/precipitation of minerals and cation exchange are potential reactions. Inverse modeling is finally used, and it verifies the detailed processes and diverse paths. Consequently, 4 major paths are found controlling the variations of groundwater chemical properties. Shallow and deep groundwater is connected primarily by the flow of deep groundwater up through fractures and faults into the shallow aquifers. Mining does not impact the underlying aquifers that represent the most critical groundwater resource. But controls should be taken to block the mixing processes from highly polluted mine water. The paper highlights the complex hydrogeochemical evolution of a multi-layer groundwater system under mining impact, which could be applied to further groundwater quality management in the study area, as well as most of the other coalfields in North China. Copyright © 2017 Elsevier B.V. All rights reserved.
Chen, Yasheng; Zhu, Hongtu; An, Hongyu; Armao, Diane; Shen, Dinggang; Gilmore, John H; Lin, Weili
2014-03-01
The aim of this study was to characterize the maturational changes of the three eigenvalues (λ1 ≥ λ2 ≥ λ3) of diffusion tensor imaging (DTI) during early postnatal life for more insights into early brain development. In order to overcome the limitations of using presumed growth trajectories for regression analysis, we employed Multivariate Adaptive Regression Splines (MARS) to derive data-driven growth trajectories for the three eigenvalues. We further employed Generalized Estimating Equations (GEE) to carry out statistical inferences on the growth trajectories obtained with MARS. With a total of 71 longitudinal datasets acquired from 29 healthy, full-term pediatric subjects, we found that the growth velocities of the three eigenvalues were highly correlated, but significantly different from each other. This paradox suggested the existence of mechanisms coordinating the maturations of the three eigenvalues even though different physiological origins may be responsible for their temporal evolutions. Furthermore, our results revealed the limitations of using the average of λ2 and λ3 as the radial diffusivity in interpreting DTI findings during early brain development because these two eigenvalues had significantly different growth velocities even in central white matter. In addition, based upon the three eigenvalues, we have documented the growth trajectory differences between central and peripheral white matter, between anterior and posterior limbs of internal capsule, and between inferior and superior longitudinal fasciculus. Taken together, we have demonstrated that more insights into early brain maturation can be gained through analyzing eigen-structural elements of DTI.
Kamtchueng, Brice T; Fantong, Wilson Y; Wirmvem, Mengnjo J; Tiodjio, Rosine E; Takounjou, Alain F; Ndam Ngoupayou, Jules R; Kusakabe, Minoru; Zhang, Jing; Ohba, Takeshi; Tanyileke, Gregory; Hell, Joseph V; Ueda, Akira
2016-09-01
With the use of conventional hydrogeochemical techniques, multivariate statistical analysis, and stable isotope approaches, this paper investigates for the first time surface water and groundwater from the surrounding areas of Lake Monoun (LM), West Cameroon. The results reveal that waters are generally slightly acidic to neutral. The relative abundance of major dissolved species are Ca(2+) > Mg(2+) > Na(+) > K(+) for cations and HCO3 (-) ≫ NO3 (-) > Cl(-) > SO4 (2-) for anions. The main water type is Ca-Mg-HCO3. Observed salinity is related to water-rock interaction, ion exchange process, and anthropogenic activities. Nitrate and chloride have been identified as the most common pollutants. These pollutants are attributed to the chlorination of wells and leaching from pit latrines and refuse dumps. The stable isotopic compositions in the investigated water sources suggest evidence of evaporation before recharge. Four major groups of waters were identified by salinity and NO3 concentrations using the Q-mode hierarchical cluster analysis (HCA). Consistent with the isotopic results, group 1 represents fresh unpolluted water occurring near the recharge zone in the general flow regime; groups 2 and 3 are mixed water whose composition is controlled by both weathering of rock-forming minerals and anthropogenic activities; group 4 represents water under high vulnerability of anthropogenic pollution. Moreover, the isotopic results and the HCA showed that the CO2-rich bottom water of LM belongs to an isolated hydrological system within the Foumbot plain. Except for some springs, groundwater water in the area is inappropriate for drinking and domestic purposes but good to excellent for irrigation.
Hong, Haoyuan; Pourghasemi, Hamid Reza; Pourtaghi, Zohre Sadat
2016-04-01
Landslides are an important natural hazard that causes a great amount of damage around the world every year, especially during the rainy season. The Lianhua area is located in the middle of China's southern mountainous area, west of Jiangxi Province, and is known to be an area prone to landslides. The aim of this study was to evaluate and compare landslide susceptibility maps produced using the random forest (RF) data mining technique with those produced by bivariate (evidential belief function and frequency ratio) and multivariate (logistic regression) statistical models for Lianhua County, China. First, a landslide inventory map was prepared using aerial photograph interpretation, satellite images, and extensive field surveys. In total, 163 landslide events were recognized in the study area, with 114 landslides (70%) used for training and 49 landslides (30%) used for validation. Next, the landslide conditioning factors-including the slope angle, altitude, slope aspect, topographic wetness index (TWI), slope-length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, distance to roads, annual precipitation, land use, normalized difference vegetation index (NDVI), and lithology-were derived from the spatial database. Finally, the landslide susceptibility maps of Lianhua County were generated in ArcGIS 10.1 based on the random forest (RF), evidential belief function (EBF), frequency ratio (FR), and logistic regression (LR) approaches and were validated using a receiver operating characteristic (ROC) curve. The ROC plot assessment results showed that for landslide susceptibility maps produced using the EBF, FR, LR, and RF models, the area under the curve (AUC) values were 0.8122, 0.8134, 0.7751, and 0.7172, respectively. Therefore, we can conclude that all four models have an AUC of more than 0.70 and can be used in landslide susceptibility mapping in the study area; meanwhile, the EBF and FR models had the best performance for Lianhua
Directory of Open Access Journals (Sweden)
Amin Hossein Morshedy
2017-07-01
Full Text Available Introduction Nowadays, exploration of rare earth element (REE resources is considered as one of the strategic priorities, which has a special position in the advanced and intelligent industries (Castor and Hedrick, 2006. Significant resources of REEs are found in a wide range of geological settings, including primary deposits associated with igneous and hydrothermal processes (e.g. carbonatite, (per alkaline-igneous rocks, iron-oxide breccia complexes, scarns, fluorapatite veins and pegmatites, and secondary deposits concentrated by sedimentary processes and weathering (e.g. heavy-mineral sand deposits, fluviatile sandstones, unconformity-related uranium deposits, and lignites (Jaireth et al., 2014. Recent studies on various parts of Iran led to the identification of promising potential of these elements, including Central Iran, alkaline rocks in the Eslami Peninsula, iron and apatite in the Hormuz Island, Kahnouj titanium deposit, granitoid bodies in Yazd, Azerbaijan, and Mashhad and associated dikes, and finally placers related to the Shemshak formation in Marvast, Kharanagh, and Ardekan indicate high concentration of REE in magmatogenic iron–apatite deposits in Central Iran and placers in Marvast area in Yazd (Ghorbani, 2013. Materials and methods In the present study, the geochemical behavior of rare earth elements is modeled by using multivariate statistical methods in the eastern part of the Marvast placer. Marvast is located 185 km south of the city of Yazd in central Iran between Yazd and Mehriz. This area lies within the southeastern part of the Sanandaj-Sirjan Zone (Alipour-Asll et al., 2012. The samples of 53 wells were analyzed for Whole-rock trace-element concentrations (including REE by inductively coupled plasma-mass spectrometry (ICP-MS (GSI, 2004. The clustering techniques such as multivariate statistical analysis technique can be employed to find appropriate groups in data sets. One of the main objectives of data clustering
Wikipedia ranking of world universities
Lages, José; Patt, Antoine; Shepelyansky, Dima L.
2016-03-01
We use the directed networks between articles of 24 Wikipedia language editions for producing the wikipedia ranking of world Universities (WRWU) using PageRank, 2DRank and CheiRank algorithms. This approach allows to incorporate various cultural views on world universities using the mathematical statistical analysis independent of cultural preferences. The Wikipedia ranking of top 100 universities provides about 60% overlap with the Shanghai university ranking demonstrating the reliable features of this approach. At the same time WRWU incorporates all knowledge accumulated at 24 Wikipedia editions giving stronger highlights for historically important universities leading to a different estimation of efficiency of world countries in university education. The historical development of university ranking is analyzed during ten centuries of their history.
A Review of Ranking Models in Data Envelopment Analysis
Directory of Open Access Journals (Sweden)
F. Hosseinzadeh Lotfi
2013-01-01
Full Text Available In the course of improving various abilities of data envelopment analysis (DEA models, many investigations have been carried out for ranking decision-making units (DMUs. This is an important issue both in theory and practice. There exist a variety of papers which apply different ranking methods to a real data set. Here the ranking methods are divided into seven groups. As each of the existing methods can be viewed from different aspects, it is possible that somewhat these groups have an overlapping with the others. The first group conducts the evaluation by a cross-efficiency matrix where the units are self- and peer-evaluated. In the second one, the ranking units are based on the optimal weights obtained from multiplier model of DEA technique. In the third group, super-efficiency methods are dealt with which are based on the idea of excluding the unit under evaluation and analyzing the changes of frontier. The fourth group involves methods based on benchmarking, which adopts the idea of being a useful target for the inefficient units. The fourth group uses the multivariate statistical techniques, usually applied after conducting the DEA classification. The fifth research area ranks inefficient units through proportional measures of inefficiency. The sixth approach involves multiple-criteria decision methodologies with the DEA technique. In the last group, some different methods of ranking units are mentioned.
Time evolution of Wikipedia network ranking
Eom, Young-Ho; Frahm, Klaus M.; Benczúr, András; Shepelyansky, Dima L.
2013-12-01
We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003-2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007-2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80% of top universities of Shanghai ranking during the considered time period.
Marengo, Emilio; Longo, Valentina; Bobba, Marco; Robotti, Elisa; Zerbinati, Orfeo; Di Martino, Silvana
2009-01-15
This paper reports the development of calibration models for quality control in the production of ethylene/propylene/1-butene terpolymers by the use of multivariate tools and FT-IR spectroscopy. 1-Butene concentration prediction is achieved in terpolymers by coupling FT-IR spectroscopy to multivariate regression tools. A dataset of 26 terpolymers (14 coming from a constrained experimental design for mixtures, plus 12 terpolymers used for external validation) was analysed by FT-IR spectroscopy. An internal method of "Polimeri Europa" plant, based on (13)C NMR spectroscopy is used to determine the percentage of 1-butene in the samples. Then, different multivariate tools are used for 1-butene concentration prediction based on the FT-IR spectra recorded. Different multivariate calibration methods were explored: principal component regression (PCR), partial least squares (PLS), stepwise OLS regression (SWR) and artificial neural networks (ANNs). The model obtained by back-propagation neural networks turned out to be the best one. The performances of the BP-ANN model were further improved by variable selection procedures based on the calculation of the first derivative of the network. The proposed approach allows the monitoring in real time of the polymer synthesis and the estimation of the characteristics of the product attainable from the concentration of 1-butene.
SAW, J.G.
THIS PAPER DEALS WITH SOME TESTS OF HYPOTHESIS FREQUENTLY ENCOUNTERED IN THE ANALYSIS OF MULTIVARIATE DATA. THE TYPE OF HYPOTHESIS CONSIDERED IS THAT WHICH THE STATISTICIAN CAN ANSWER IN THE NEGATIVE OR AFFIRMATIVE. THE DOOLITTLE METHOD MAKES IT POSSIBLE TO EVALUATE THE DETERMINANT OF A MATRIX OF HIGH ORDER, TO SOLVE A MATRIX EQUATION, OR TO…
Karunathilaka, Sanjeewa R; Kia, Ali-Reza Fardin; Srigley, Cynthia; Chung, Jin Kyu; Mossoba, Magdi M
2016-10-01
A rapid tool for evaluating authenticity was developed and applied to the screening of extra virgin olive oil (EVOO) retail products by using Fourier-transform near infrared (FT-NIR) spectroscopy in combination with univariate and multivariate data analysis methods. Using disposable glass tubes, spectra for 62 reference EVOO, 10 edible oil adulterants, 20 blends consisting of EVOO spiked with adulterants, 88 retail EVOO products and other test samples were rapidly measured in the transmission mode without any sample preparation. The univariate conformity index (CI) and the multivariate supervised soft independent modeling of class analogy (SIMCA) classification tool were used to analyze the various olive oil products which were tested for authenticity against a library of reference EVOO. Better discrimination between the authentic EVOO and some commercial EVOO products was observed with SIMCA than with CI analysis. Approximately 61% of all EVOO commercial products were flagged by SIMCA analysis, suggesting that further analysis be performed to identify quality issues and/or potential adulterants. Due to its simplicity and speed, FT-NIR spectroscopy in combination with multivariate data analysis can be used as a complementary tool to conventional official methods of analysis to rapidly flag EVOO products that may not belong to the class of authentic EVOO. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Multivariable Chinese Remainder Theorem
Indian Academy of Sciences (India)
Home; Journals; Resonance – Journal of Science Education; Volume 20; Issue 3. Multivariable Chinese Remainder Theorem. B Sury. General Article Volume 20 Issue 3 March 2015 pp 206-216 ... Author Affiliations. B Sury1. Stat-Math Unit, Indian Statistical Institute, 8th Mile Road, Bangalore 560 059, India.
Godelmann, Rolf; Fang, Fang; Humpfer, Eberhard; Schütz, Birk; Bansbach, Melanie; Schäfer, Hartmut; Spraul, Manfred
2013-06-12
The authenticity, the grape variety, the geographical origin, and the year of vintage of wines produced in Germany were investigated by (1)H NMR spectroscopy in combination with several steps of multivariate data analysis including principal component analysis (PCA), linear discrimination analysis (LDA), and multivariate analysis of variance (MANOVA) together with cross-validation (CV) embedded in a Monte Carlo resampling approach (MC) and others. A total of about 600 wines were selected and carefully collected from five wine-growing areas in the southern and southwestern parts of Germany. Simultaneous saturation of the resonances of water and ethanol by application of a low-power eight-frequency band irradiation using shaped pulses allowed for high receiver gain settings and hence optimized signal-to-noise ratios. Correct prediction of classification of the grape varieties of Pinot noir, Lemberger, Pinot blanc/Pinot gris, Müller-Thurgau, Riesling, and Gewürztraminer of 95% in the wine panel was achieved. The classification of the vintage of all analyzed wines resulted in correct predictions of 97 and 96%, respectively, for vintage 2008 (n = 318) and 2009 (n = 265). The geographic origin of all wines from the largest German wine-producing regions, Rheinpfalz, Rheinhessen, Mosel, Baden, and Württemberg, could be predicted 89% correctly on average. Each NMR spectrum could be regarded as the individual "fingerprint" of a wine sample, which includes information about variety, origin, vintage, physiological state, technological treatment, and others.
Multivariate bubbles and antibubbles
Fry, John
2014-08-01
In this paper we develop models for multivariate financial bubbles and antibubbles based on statistical physics. In particular, we extend a rich set of univariate models to higher dimensions. Changes in market regime can be explicitly shown to represent a phase transition from random to deterministic behaviour in prices. Moreover, our multivariate models are able to capture some of the contagious effects that occur during such episodes. We are able to show that declining lending quality helped fuel a bubble in the US stock market prior to 2008. Further, our approach offers interesting insights into the spatial development of UK house prices.
Genton, Marc G.
2017-09-07
We present a hierarchical decomposition scheme for computing the n-dimensional integral of multivariate normal probabilities that appear frequently in statistics. The scheme exploits the fact that the formally dense covariance matrix can be approximated by a matrix with a hierarchical low rank structure. It allows the reduction of the computational complexity per Monte Carlo sample from O(n2) to O(mn+knlog(n/m)), where k is the numerical rank of off-diagonal matrix blocks and m is the size of small diagonal blocks in the matrix that are not well-approximated by low rank factorizations and treated as dense submatrices. This hierarchical decomposition leads to substantial efficiencies in multivariate normal probability computations and allows integrations in thousands of dimensions to be practical on modern workstations.
Lewandowski, Dirk
2015-01-01
Purpose: This paper discusses ranking factors suitable for library materials and shows that ranking in general is a complex process and that ranking for library materials requires a variety of techniques. Design/methodology/approach: The relevant literature is reviewed to provide a systematic overview of suitable ranking factors. The discussion is based on an overview of ranking factors used in Web search engines. Findings: While there are a wide variety of ranking factors appl...
Su, Shiliang; Zhi, Junjun; Lou, Liping; Huang, Fang; Chen, Xia; Wu, Jiaping
Characterizing the spatio-temporal patterns and apportioning the pollution sources of water bodies are important for the management and protection of water resources. The main objective of this study is to describe the dynamics of water quality and provide references for improving river pollution control practices. Comprehensive application of neural-based modeling and different multivariate methods was used to evaluate the spatio-temporal patterns and source apportionment of pollution in Qiantang River, China. Measurement data were obtained and pretreated for 13 variables from 41 monitoring sites for the period of 2001-2004. A self-organizing map classified the 41 monitoring sites into three groups (Group A, B and C), representing different pollution characteristics. Four significant parameters (dissolved oxygen, biochemical oxygen demand, total phosphorus and total lead) were identified by discriminant analysis for distinguishing variations of different years, with about 80% correct assignment for temporal variation. Rotated principal component analysis (PCA) identified four potential pollution sources for Group A (domestic sewage and agricultural pollution, industrial wastewater pollution, mineral weathering, vehicle exhaust and sand mining), five for Group B (heavy metal pollution, agricultural runoff, vehicle exhaust and sand mining, mineral weathering, chemical plants discharge) and another five for Group C (vehicle exhaust and sand mining, chemical plants discharge, soil weathering, biochemical pollution, mineral weathering). The identified potential pollution sources explained 75.6% of the total variances for Group A, 75.0% for Group B and 80.0% for Group C, respectively. Receptor-based source apportionment was applied to further estimate source contributions for each pollution variable in the three groups, which facilitated and supported the PCA results. These results could assist managers to develop optimal strategies and determine priorities for river
Rankings Methodology Hurts Public Institutions
Van Der Werf, Martin
2007-01-01
In the 1980s, when the "U.S. News & World Report" rankings of colleges were based solely on reputation, the nation's public universities were well represented at the top. However, as soon as the magazine began including its "measures of excellence," statistics intended to define quality, public universities nearly disappeared from the top. As the…
Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu
2017-04-01
The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na+, K+, Ca2+, Mg2+, HCO3-, Cl-, F-, SO42-, PO43-, NO3-), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F- species and the anthropogenic one for NO3-, SO42-, PO43- and K+ and (ii) classification of groundwater based on content of arsenic species. The HCO3- type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F- but by different paths. The PO43--AsO43- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na+-F--pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
Yoshida, Hiroyuki; Shibata, Hiroko; Izutsu, Ken-Ichi; Goda, Yukihiro
2017-01-01
The current Japanese Ministry of Health Labour and Welfare (MHLW)'s Guideline for Bioequivalence Studies of Generic Products uses averaged dissolution rates for the assessment of dissolution similarity between test and reference formulations. This study clarifies how the application of model-independent multivariate confidence region procedure (Method B), described in the European Medical Agency and U.S. Food and Drug Administration guidelines, affects similarity outcomes obtained empirically from dissolution profiles with large variations in individual dissolution rates. Sixty-one datasets of dissolution profiles for immediate release, oral generic, and corresponding innovator products that showed large variation in individual dissolution rates in generic products were assessed on their similarity by using the f2 statistics defined in the MHLW guidelines (MHLW f2 method) and two different Method B procedures, including a bootstrap method applied with f2 statistics (BS method) and a multivariate analysis method using the Mahalanobis distance (MV method). The MHLW f2 and BS methods provided similar dissolution similarities between reference and generic products. Although a small difference in the similarity assessment may be due to the decrease in the lower confidence interval for expected f2 values derived from the large variation in individual dissolution rates, the MV method provided results different from those obtained through MHLW f2 and BS methods. Analysis of actual dissolution data for products with large individual variations would provide valuable information towards an enhanced understanding of these methods and their possible incorporation in the MHLW guidelines.
Directory of Open Access Journals (Sweden)
Mohammed M. Kamal
2012-09-01
Full Text Available MODerate Resolution Imaging Spectroradiometer (MODIS aerosol retrievals over the North Atlantic spanning seven hurricane seasons are combined with the Statistical Hurricane Intensity Prediction Scheme (SHIPS parameters. The difference between the current and future intensity changes were selected as response variables. For 24 major hurricanes (category 3, 4 and 5 between 2003 and 2009, eight lead time response variables were determined to be between 6 and 48 h. By combining MODIS and SHIPS data, 56 variables were compiled and selected as predictors for this study. Variable reduction from 56 to 31 was performed in two steps; the first step was via correlation coefficients (cc followed by Principal Component Analysis (PCA extraction techniques. The PCA reduced 31 variables to 20. Five categories were established based on the PCA group variables exhibiting similar physical phenomena. Average aerosol retrievals from MODIS Level 2 data in the vicinity of UTC 1,200 and 1,800 h were mapped to the SHIPS parameters to perform Multiple Linear Regression (MLR between each response variable against six sets of predictors of 31, 30, 28, 27, 23 and 20 variables. The deviation among the predictors Root Mean Square Error (RMSE varied between 0.01 through 0.05 and, therefore, implied that reducing the number of variables did not change the core physical information. Even when the parameters are reduced from 56 to 20, the correlation values exhibit a stronger relationship between the response and predictors. Therefore, the same phenomena can be explained by the reduction of variables.
Cointegration rank testing under conditional heteroskedasticity
DEFF Research Database (Denmark)
Cavaliere, Giuseppe; Rahbek, Anders Christian; Taylor, Robert M.
2010-01-01
(martingale difference) innovations. We first demonstrate that the limiting null distributions of the rank statistics coincide with those derived by previous authors who assume either independent and identically distributed (i.i.d.) or (strict and covariance) stationary martingale difference innovations. We...... then propose wild bootstrap implementations of the cointegrating rank tests and demonstrate that the associated bootstrap rank statistics replicate the first-order asymptotic null distributions of the rank statistics. We show that the same is also true of the corresponding rank tests based on the i.......i.d. bootstrap of Swensen (2006, Econometrica 74, 1699-1714). The wild bootstrap, however, has the important property that, unlike the i.i.d. bootstrap, it preserves in the resampled data the pattern of heteroskedasticity present in the original shocks. Consistent with this, numerical evidence suggests that...
A Fast Algorithm for Generating Permutation Distribution of Ranks in ...
African Journals Online (AJOL)
... function of the distribution of the ranks. This further gives insight into the permutation distribution of a rank statistics. The algorithm is implemented with the aid of the computer algebra system Mathematica. Key words: Combinatorics, generating function, permutation distribution, rank statistics, partitions, computer algebra.
A tilting approach to ranking influence
Genton, Marc G.
2014-12-01
We suggest a new approach, which is applicable for general statistics computed from random samples of univariate or vector-valued or functional data, to assessing the influence that individual data have on the value of a statistic, and to ranking the data in terms of that influence. Our method is based on, first, perturbing the value of the statistic by ‘tilting’, or reweighting, each data value, where the total amount of tilt is constrained to be the least possible, subject to achieving a given small perturbation of the statistic, and, then, taking the ranking of the influence of data values to be that which corresponds to ranking the changes in data weights. It is shown, both theoretically and numerically, that this ranking does not depend on the size of the perturbation, provided that the perturbation is sufficiently small. That simple result leads directly to an elegant geometric interpretation of the ranks; they are the ranks of the lengths of projections of the weights onto a ‘line’ determined by the first empirical principal component function in a generalized measure of covariance. To illustrate the generality of the method we introduce and explore it in the case of functional data, where (for example) it leads to generalized boxplots. The method has the advantage of providing an interpretable ranking that depends on the statistic under consideration. For example, the ranking of data, in terms of their influence on the value of a statistic, is different for a measure of location and for a measure of scale. This is as it should be; a ranking of data in terms of their influence should depend on the manner in which the data are used. Additionally, the ranking recognizes, rather than ignores, sign, and in particular can identify left- and right-hand ‘tails’ of the distribution of a random function or vector.
Ranking Operations Management conferences
Steenhuis, H.J.; de Bruijn, E.J.; Gupta, Sushil; Laptaned, U
2007-01-01
Several publications have appeared in the field of Operations Management which rank Operations Management related journals. Several ranking systems exist for journals based on , for example, perceived relevance and quality, citation, and author affiliation. Many academics also publish at conferences
Ranking de universidades chilenas: un análisis multivariado
Directory of Open Access Journals (Sweden)
Firinguetti Limone, Luis
2015-06-01
Full Text Available In this work a ranking of Chilean universities on the basis of publicly available information is developed. This ranking takes into account the multivariate character of these institutions. Also, it is noted that the results are consistent with those of a well-known international ranking that uses a different set of data, as well as with several multivariate analyses of the data considered in this study.En este trabajo se elabora un ranking de las universidades chilenas en base a información pública disponible. Dicho ranking toma en cuenta el carácter multivariado de estas instituciones. Además, se ha comprobado que los resultados del ranking son consistentes con un conocido ranking internacional construido a partir de un conjunto diferente de datos y con varios análisis multivariados realizados de la información tratada en este estudio.
Introduction to multivariate discrimination
Directory of Open Access Journals (Sweden)
Kégl Balázs
2013-07-01
Full Text Available Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1–9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1 we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2, since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1. Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5. We conclude the chapter with three essentially open research problems
Introduction to multivariate discrimination
Kégl, Balázs
2013-07-01
Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either
Sparse structure regularized ranking
Wang, Jim Jing-Yan
2014-04-17
Learning ranking scores is critical for the multimedia database retrieval problem. In this paper, we propose a novel ranking score learning algorithm by exploring the sparse structure and using it to regularize ranking scores. To explore the sparse structure, we assume that each multimedia object could be represented as a sparse linear combination of all other objects, and combination coefficients are regarded as a similarity measure between objects and used to regularize their ranking scores. Moreover, we propose to learn the sparse combination coefficients and the ranking scores simultaneously. A unified objective function is constructed with regard to both the combination coefficients and the ranking scores, and is optimized by an iterative algorithm. Experiments on two multimedia database retrieval data sets demonstrate the significant improvements of the propose algorithm over state-of-the-art ranking score learning algorithms.
Radulović, Niko S; Blagojević, Polina D
2013-08-02
Plant volatiles have been repeatedly shown to provide valuable insight into the evolutionary relationships among plant taxa on various taxonomical levels. The number of variables available from GC-MS analyses of these plant metabolites usually represents a large data set. The comparison of such data sets requires the use of multivariate statistical analyses (MSA) but with several serious shortcomings. In order to make multivariate statistical comparison of essential oils more applicable, reliable and faster, this work was set to explore the suitability of a complementary use of relative abundances of m/z values of the average mass scan of the total GC chromatograms instead of the traditionally used variables-percentages (peak areas) of individual oil constituents. To achieve this, essential oils extracted from 12 different Artemisia species were analyzed using GC-FID and GC-MS. Almost 500 different constituents were successfully identified. Average mass scans of the total GC chromatograms (AMS) and chemical compositions (relative percentages) of the analyzed oils were separately compared using two MSA methods: agglomerative hierarchical cluster analysis and principal component analysis. This approach was applied to an additional set of essential oil compositional data (representatives of a number of different genera/families; data from the literature) using both types of variables. The obtained results strongly suggest that MSA of complex volatile mixtures, using the corresponding directly obtainable AMS, could be considered as a promising time saving tool for easy and reliable comparison purposes. The AMS approach gives comparable or even better results than the traditional method - it reflected the natural relationships between observations within both studied groups of oils. Copyright © 2013 Elsevier B.V. All rights reserved.
Multivariate strategies in functional magnetic resonance imaging
DEFF Research Database (Denmark)
Hansen, Lars Kai
2007-01-01
We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a `mind reading' predictive multivariate fMRI model....
Multivariable modeling and multivariate analysis for the behavioral sciences
Everitt, Brian S
2009-01-01
Multivariable Modeling and Multivariate Analysis for the Behavioral Sciences shows students how to apply statistical methods to behavioral science data in a sensible manner. Assuming some familiarity with introductory statistics, the book analyzes a host of real-world data to provide useful answers to real-life issues.The author begins by exploring the types and design of behavioral studies. He also explains how models are used in the analysis of data. After describing graphical methods, such as scatterplot matrices, the text covers simple linear regression, locally weighted regression, multip
Multivariate normative comparisons.
Huizenga, Hilde M; Smeding, Harriet; Grasman, Raoul P P P; Schmand, Ben
2007-06-18
In neuropsychological evaluations and single case research generally a number of tests are administered, since the interest is not in a single, but in multiple characteristics of a patient. The typical problem is to decide whether or not a patient is different from normal controls with respect to one or more of these characteristics. Consideration of each characteristic separately entails an increased risk of a false positive decision (a wrongful decision that the patient is abnormal, or a type 1 error). From a statistical point of view this calls for a multivariate analysis. In this paper, we propose two approaches to perform normative comparisons for such multivariate data: Bonferroni corrected univariate comparisons and a multivariate comparison. Both approaches allow for the testing of unidirectional (two-sided) as well as directional (one-sided) hypothesis, i.e. the hypothesis that a patient deviates in a negative sense from the norm. Monte Carlo simulations were performed to check if the type I error of both approaches is adequately controlled, and to investigate the power of both approaches to detect deviation from the norm. The results indicate that the type I error rate of both approaches is correct, even in small samples. The results also indicate that the power is higher for the univariate approach if the normative sample size is very small (i.e. just exceeds the number of tests administered). In larger samples, the multivariate comparison has in general increased power. We illustrate both approaches with a clinical example of patients with Parkinson disease, who received deep brain stimulation to alleviate motor symptoms, and who were neuropsychologically evaluated to detect possible cognitive side effects.
Second order analysis of two-stage rank tests for the one-sample problem
Albers, Willem/Wim
1991-01-01
In this paper we present a rank analogue to Stein's two-stage procedure. We analyze its behavior to second order using existing asymptotic expansions for fixed sample size rank tests and recent results on combinations of independent rank statistics.
Maximum Waring ranks of monomials
Holmes, Erik; Plummer, Paul; Siegert, Jeremy; Teitler, Zach
2013-01-01
We show that monomials and sums of pairwise coprime monomials in four or more variables have Waring rank less than the generic rank, with a short list of exceptions. We asymptotically compare their ranks with the generic rank.
Rank distributions: Frequency vs. magnitude.
Velarde, Carlos; Robledo, Alberto
2017-01-01
We examine the relationship between two different types of ranked data, frequencies and magnitudes. We consider data that can be sorted out either way, through numbers of occurrences or size of the measures, as it is the case, say, of moon craters, earthquakes, billionaires, etc. We indicate that these two types of distributions are functional inverses of each other, and specify this link, first in terms of the assumed parent probability distribution that generates the data samples, and then in terms of an analog (deterministic) nonlinear iterated map that reproduces them. For the particular case of hyperbolic decay with rank the distributions are identical, that is, the classical Zipf plot, a pure power law. But their difference is largest when one displays logarithmic decay and its counterpart shows the inverse exponential decay, as it is the case of Benford law, or viceversa. For all intermediate decay rates generic differences appear not only between the power-law exponents for the midway rank decline but also for small and large rank. We extend the theoretical framework to include thermodynamic and statistical-mechanical concepts, such as entropies and configuration.
Bradshaw, Corey J A; Brook, Barry W
2016-01-01
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68-0.84 Spearman's ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.
Bornmann, Lutz; Anegon, Felix de Moya; Mutz, Ruediger
2014-01-01
Bornmann, Stefaner, de Moya Anegon, and Mutz (in press) have introduced a web application (www.excellencemapping.net) which is linked to both academic ranking lists published hitherto (e.g. the Academic Ranking of World Universities) as well as spatial visualization approaches. The web application visualizes institutional performance within specific subject areas as ranking lists and on custom tile-based maps. The new, substantially enhanced version of the web application and the multilevel logistic regression on which it is based are described in this paper. Scopus data were used which have been collected for the SCImago Institutions Ranking. Only those universities and research-focused institutions are considered that have published at least 500 articles, reviews and conference papers in the period 2006 to 2010 in a certain Scopus subject area. In the enhanced version, the effect of single covariates (such as the per capita GDP of a country in which an institution is located) on two performance metrics (bes...
Thomas, J R; Nelson, J K; Thomas, K T
1999-03-01
Frequent violations of the assumption that data are normally distributed occur in exercise science and other life and behavioral sciences. When this assumption is violated, parametric statistical analyses may be inappropriate for data analysis. We provide a rationale for using a generalized form of nonparametric analyses based on the Puri and Sen (1985) L treated as a chi 2 approximation. If data do not meet the assumption of normality, this nonparametric approach has substantial power and is easy to use. An advantage of this generalized technique is that ranked data may be used in standard parametric statistical programs widely available on desktop and mainframe computers, for example, regression, analysis of variance (ANOVA), multivariate analysis of variance (MANOVA) within BioMed, SAS, SPSS. Once the data are ranked and analyzed with these programs, the only adjustment required is to use a standard formula to calculate the nonparametric test statistic, L, instead of the parametric test statistic (e.g., F). Thus, rank-order nonparametric models become parallel with their parametric counterparts allowing the researcher to select between them based on characteristics of the data distribution. Examples of this approach are provided using data from exercise science for regression, ANOVA (including repeated measures) and MANOVA techniques from SPSSPC. Using these procedures, researchers can easily examine data distributions and make an appropriate decision about parametric or nonparametric analyses while continuing to use their regular statistical packages.
Academic rankings: an approach to a Portuguese ranking
Bernardino, Pedro; Marques,Rui
2009-01-01
The academic rankings are a controversial subject in higher education. However, despite all the criticism, academic rankings are here to stay and more and more different stakeholders use rankings to obtain information about the institutions’ performance. The two most well-known rankings, The Times and the Shanghai Jiao Tong University rankings have different methodologies. The Times ranking is based on peer review, whereas the Shanghai ranking has only quantitative indicators and is mainly ba...
A study of serial ranks via random graphs
Haeusler, Erich; Mason, David M.; Turova, Tatyana S.
2000-01-01
Serial ranks have long been used as the basis for nonparametric tests of independence in time series analysis. We shall study the underlying graph structure of serial ranks. This will lead us to a basic martingale which will allow us to construct a weighted approximation to a serial rank process. To show the applicability of this approximation, we will use it to prove two very general central limit theorems for Wald-Wolfowitz-type serial rank statistics.
ARWU vs. Alternative ARWU Ranking: What are the Consequences for Lower Ranked Universities?
Directory of Open Access Journals (Sweden)
Milica Maričić
2017-05-01
Full Text Available The ARWU ranking has been a source of academic debate since its development in 2003, but the same does not account for the Alternative ARWU ranking. Namely, the Alternative ARWU ranking attempts to reduce the influence of the prestigious indicators Alumni and Award which are based on the number of received Nobel Prizes and Fields Medals by alumni or university staff. However, the consequences of the reduction of the two indicators have not been scrutinized in detail. Therefore, we propose a statistical approach to the comparison of the two rankings and an in-depth analysis of the Alternative ARWU groups. The obtained results, which are based on the official data, can provide new insights into the nature of the Alternative ARWU ranking. The presented approach might initiate further research on the Alternative ARWU ranking and on the impact of university ranking’s list length. JEL Classification: C10, C38, I23
Li, Siyue; Li, Jia; Zhang, Quanfa
2011-11-15
A total of 190 grab water samples were collected from 19 rivers along the water conveyance system of the Middle Route of China's interbasin South to North Water Transfer Project (MRSNWTP). Multivariate statistics including principal component/factor analysis (PCA/FA), analysis of variance (ANOVA), and cluster analysis (CA) were employed to assess water quality, and the receptor model of factor analysis-multiple linear regression (FA-MLR) was used for source identification/apportionment of pollutants from natural processes and anthropogenic activities to river waters. Our results revealed that river waters were primarily polluted by COD(Mn), BOD, NH(4)(+)-N, TN, TP, and Cd with remarkably spatio-temporal variability, and there were increasing industrial effluents in rivers northward. FA/PCA identified four classes of water quality parameters, i.e., mineral composition, toxic metals, nutrients, and organic pollutants. CA classified the selective 19 rivers into three groups reflecting their varying water pollution levels of moderated pollution, high pollution, and very high pollution. The FA-MLR receptor modeling revealed predominantly anthropogenic inputs to river solutes in Beijing and Tianjin, i.e., 77% of nitrogen and 90% of phosphorus from industry, and 80% of COD(Mn) from domestics. This study is critical for water allocation and division in the water-receiving areas using the existing rivers for MRSNWTP. Copyright © 2011 Elsevier B.V. All rights reserved.
Ranking Economic History Journals
DEFF Research Database (Denmark)
Di Vaio, Gianfranco; Weisdorf, Jacob Louis
This study ranks - for the first time - 12 international academic journals that have economic history as their main topic. The ranking is based on data collected for the year 2007. Journals are ranked using standard citation analysis where we adjust for age, size and self-citation of journals. We...... also compare the leading economic history journals with the leading journals in economics in order to measure the influence on economics of economic history, and vice versa. With a few exceptions, our results confirm the general idea about what economic history journals are the most influential...
Ranking economic history journals
DEFF Research Database (Denmark)
Di Vaio, Gianfranco; Weisdorf, Jacob Louis
2010-01-01
This study ranks-for the first time-12 international academic journals that have economic history as their main topic. The ranking is based on data collected for the year 2007. Journals are ranked using standard citation analysis where we adjust for age, size and self-citation of journals. We also...... compare the leading economic history journals with the leading journals in economics in order to measure the influence on economics of economic history, and vice versa. With a few exceptions, our results confirm the general idea about what economic history journals are the most influential for economic...
Recurrent fuzzy ranking methods
Hajjari, Tayebeh
2012-11-01
With the increasing development of fuzzy set theory in various scientific fields and the need to compare fuzzy numbers in different areas. Therefore, Ranking of fuzzy numbers plays a very important role in linguistic decision-making, engineering, business and some other fuzzy application systems. Several strategies have been proposed for ranking of fuzzy numbers. Each of these techniques has been shown to produce non-intuitive results in certain case. In this paper, we reviewed some recent ranking methods, which will be useful for the researchers who are interested in this area.
Akbudak, Kadir
2017-05-11
Covariance matrices are ubiquitous in computational science and engineering. In particular, large covariance matrices arise from multivariate spatial data sets, for instance, in climate/weather modeling applications to improve prediction using statistical methods and spatial data. One of the most time-consuming computational steps consists in calculating the Cholesky factorization of the symmetric, positive-definite covariance matrix problem. The structure of such covariance matrices is also often data-sparse, in other words, effectively of low rank, though formally dense. While not typically globally of low rank, covariance matrices in which correlation decays with distance are nearly always hierarchically of low rank. While symmetry and positive definiteness should be, and nearly always are, exploited for performance purposes, exploiting low rank character in this context is very recent, and will be a key to solving these challenging problems at large-scale dimensions. The authors design a new and flexible tile row rank Cholesky factorization and propose a high performance implementation using OpenMP task-based programming model on various leading-edge manycore architectures. Performance comparisons and memory footprint saving on up to 200K×200K covariance matrix size show a gain of more than an order of magnitude for both metrics, against state-of-the-art open-source and vendor optimized numerical libraries, while preserving the numerical accuracy fidelity of the original model. This research represents an important milestone in enabling large-scale simulations for covariance-based scientific applications.
Multivariate segmentation of fMRI for human brain mapping
Lei, Tianhu; Udupa, Jayaram K.
2000-04-01
fMRI has provided a new option to study cognitive phenomena. Recent developments in medical image processing and analysis allow researchers to study more elaborate cognitive tasks from a wide perspective. These techniques include Statistical Parametric Mapping, Subspace Modeling and Maximum Likelihood Estimation, and Spatio-temporal Analysis using Random Fields. Their common weakness is the assumption of the statistical independence among the image pixels. We have developed a multivariate segmentation method to functional MRI analysis for human brain function study based on the second-order statistics of images. It consists of four steps: (1) detecting the number of the distinctive image regions, (2) generating the scores and determining their rank, (3) forming score plots and clustering in the feature space, (4) projecting clusters from the feature space to the image space to generate object images. We have validated this method on the simulated and fMRI images. The theoretical and experimental results obtained by using this method were in good agreement. The relations between this method and other multivariate image analysis methods are discussed.
Asset ranking manager (ranking index of components)
Energy Technology Data Exchange (ETDEWEB)
Maloney, S.M.; Engle, A.M.; Morgan, T.A. [Applied Reliability, Maracor Software and Engineering (United States)
2004-07-01
The Ranking Index of Components (RIC) is an Asset Reliability Manager (ARM), which itself is a Web Enabled front end where plant database information fields from several disparate databases are combined. That information is used to create a specific weighted number (Ranking Index) relating to that components health and risk to the site. The higher the number, the higher priority that any work associated with that component receives. ARM provides site Engineering, Maintenance and Work Control personnel with a composite real time - (current condition) look at the components 'risk of not working' to the plant. Information is extracted from the existing Computerized Maintenance management System (CMMS) and specific site applications and processed nightly. ARM helps to ensure that the most important work is placed into the workweeks and the non value added work is either deferred, frequency changed or deleted. This information is on the web, updated each night, and available for all employees to use. This effort assists the work management specialist when allocating limited resources to the most important work. The use of this tool has maximized resource usage, performing the most critical work with available resources. The ARM numbers are valued inputs into work scoping for the workweek managers. System and Component Engineers are using ARM to identify the components that are at 'risk of failure' and therefore should be placed into the appropriate work week schedule.
Low Rank Approximation Algorithms, Implementation, Applications
Markovsky, Ivan
2012-01-01
Matrix low-rank approximation is intimately related to data modelling; a problem that arises frequently in many different fields. Low Rank Approximation: Algorithms, Implementation, Applications is a comprehensive exposition of the theory, algorithms, and applications of structured low-rank approximation. Local optimization methods and effective suboptimal convex relaxations for Toeplitz, Hankel, and Sylvester structured problems are presented. A major part of the text is devoted to application of the theory. Applications described include: system and control theory: approximate realization, model reduction, output error, and errors-in-variables identification; signal processing: harmonic retrieval, sum-of-damped exponentials, finite impulse response modeling, and array processing; machine learning: multidimensional scaling and recommender system; computer vision: algebraic curve fitting and fundamental matrix estimation; bioinformatics for microarray data analysis; chemometrics for multivariate calibration; ...
Rank range test for equality of dispersion | Odiase | Journal of ...
African Journals Online (AJOL)
This paper exploits the computational simplicity of the range of a set of data to formulate a twosample scale test called the Rank Range test. The performance of the test statistic is compared with other tests of scale. The exact distribution of the Rank Range test statistic is generated empirically through the unconditional ...
Directory of Open Access Journals (Sweden)
Arda Halu
Full Text Available Many complex systems can be described as multiplex networks in which the same nodes can interact with one another in different layers, thus forming a set of interacting and co-evolving networks. Examples of such multiplex systems are social networks where people are involved in different types of relationships and interact through various forms of communication media. The ranking of nodes in multiplex networks is one of the most pressing and challenging tasks that research on complex networks is currently facing. When pairs of nodes can be connected through multiple links and in multiple layers, the ranking of nodes should necessarily reflect the importance of nodes in one layer as well as their importance in other interdependent layers. In this paper, we draw on the idea of biased random walks to define the Multiplex PageRank centrality measure in which the effects of the interplay between networks on the centrality of nodes are directly taken into account. In particular, depending on the intensity of the interaction between layers, we define the Additive, Multiplicative, Combined, and Neutral versions of Multiplex PageRank, and show how each version reflects the extent to which the importance of a node in one layer affects the importance the node can gain in another layer. We discuss these measures and apply them to an online multiplex social network. Findings indicate that taking the multiplex nature of the network into account helps uncover the emergence of rankings of nodes that differ from the rankings obtained from one single layer. Results provide support in favor of the salience of multiplex centrality measures, like Multiplex PageRank, for assessing the prominence of nodes embedded in multiple interacting networks, and for shedding a new light on structural properties that would otherwise remain undetected if each of the interacting networks were analyzed in isolation.
Halu, Arda; Mondragón, Raúl J; Panzarasa, Pietro; Bianconi, Ginestra
2013-01-01
Many complex systems can be described as multiplex networks in which the same nodes can interact with one another in different layers, thus forming a set of interacting and co-evolving networks. Examples of such multiplex systems are social networks where people are involved in different types of relationships and interact through various forms of communication media. The ranking of nodes in multiplex networks is one of the most pressing and challenging tasks that research on complex networks is currently facing. When pairs of nodes can be connected through multiple links and in multiple layers, the ranking of nodes should necessarily reflect the importance of nodes in one layer as well as their importance in other interdependent layers. In this paper, we draw on the idea of biased random walks to define the Multiplex PageRank centrality measure in which the effects of the interplay between networks on the centrality of nodes are directly taken into account. In particular, depending on the intensity of the interaction between layers, we define the Additive, Multiplicative, Combined, and Neutral versions of Multiplex PageRank, and show how each version reflects the extent to which the importance of a node in one layer affects the importance the node can gain in another layer. We discuss these measures and apply them to an online multiplex social network. Findings indicate that taking the multiplex nature of the network into account helps uncover the emergence of rankings of nodes that differ from the rankings obtained from one single layer. Results provide support in favor of the salience of multiplex centrality measures, like Multiplex PageRank, for assessing the prominence of nodes embedded in multiple interacting networks, and for shedding a new light on structural properties that would otherwise remain undetected if each of the interacting networks were analyzed in isolation.
Ranking of Rankings: Benchmarking Twenty-Five Higher Education Ranking Systems in Europe
Stolz, Ingo; Hendel, Darwin D.; Horn, Aaron S.
2010-01-01
The purpose of this study is to evaluate the ranking practices of 25 European higher education ranking systems (HERSs). Ranking practices were assessed with 14 quantitative measures derived from the Berlin Principles on Ranking of Higher Education Institutions (BPs). HERSs were then ranked according to their degree of congruence with the BPs.…
Chan, Kar-Man; Yue, Grace Gar-Lee; Li, Ping; Wong, Eric Chun-Wai; Lee, Julia Kin-Ming; Kennelly, Edward J; Lau, Clara Bik-San
2017-03-03
According to Chinese Pharmacopoeia 2015 edition, Ganoderma (Lingzhi) is a species complex that comprise of Ganoderma lucidum and Ganoderma sinense. The bioactivity and chemical composition of G. lucidium had been studied extensively, and it was shown to possess antitumor activities in pharmacological studies. In contrast, G. sinense has not been studied in great detail. Our previous studies found that the stipe of G. sinense exhibited more potent antitumor activity than the pileus. To identify the antitumor compounds in the stipe of G. sinense, we studied its chemical components by merging the bioactivity results with liquid chromatography-mass spectrometry-based chemometrics. The stipe of G. sinense was extracted with water, followed by ethanol precipitation and liquid-liquid partition. The resulting residue was fractionated using column chromatography. The antitumor activity of these fractions were analysed using MTT assay in murine breast tumor 4T1 cells, and their chemical components were studied using the LC-QTOF-MS with multivariate statistical tools. The chemometric and MS/MS analysis correlated bioactivity with five known cytotoxic compounds, 4-hyroxyphenylacetate, 9-oxo-(10E,12E)-octadecadienoic acid, 3-phenyl-2-propenoic acid, 13-oxo-(9E,11E)-octadecadienoic acid and lingzhine C, from the stipe of G. sinense. To the best of our knowledge, 4-hyroxyphenylacetate, 3-phenyl-2-propenoic acid and lingzhine C are firstly reported to be found in G. sinense. These five compounds will be investigated for their antitumor activities in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Kirch, Darrell G; Prescott, John E
2013-08-01
Since the 1980s, school ranking systems have been a topic of discussion among leaders of higher education. Various ranking systems are based on inadequate data that fail to illustrate the complex nature and special contributions of the institutions they purport to rank, including U.S. medical schools, each of which contributes uniquely to meeting national health care needs. A study by Tancredi and colleagues in this issue of Academic Medicine illustrates the limitations of rankings specific to primary care training programs. This commentary discusses, first, how each school's mission and strengths, as well as the impact it has on the community it serves, are distinct, and, second, how these schools, which are each unique, are poorly represented by overly subjective ranking methodologies. Because academic leaders need data that are more objective to guide institutional development, the Association of American Medical Colleges (AAMC) has been developing tools to provide valid data that are applicable to each medical school. Specifically, the AAMC's Medical School Admissions Requirements and its Missions Management Tool each provide a comprehensive assessment of medical schools that leaders are using to drive institutional capacity building. This commentary affirms the importance of mission while challenging the leaders of medical schools, teaching hospitals, and universities to use reliable data to continually improve the quality of their training programs to improve the health of all.
Sowter, Ben
2008-01-01
This paper presents key new developments in the THES - QS World University Rankings in 2007, related to enhancements to the "Peer Review", "Data Collection" and "Statistical Aggregation" utilised in this ranking as well as discussing the decision to utilise Full-Time Equivalent (FTE) figures for personnel statistics. Indicator correlation is also…
Global Low-Rank Image Restoration With Gaussian Mixture Model.
Zhang, Sibo; Jiao, Licheng; Liu, Fang; Wang, Shuang
2017-06-27
Low-rank restoration has recently attracted a lot of attention in the research of computer vision. Empirical studies show that exploring the low-rank property of the patch groups can lead to superior restoration performance, however, there is limited achievement on the global low-rank restoration because the rank minimization at image level is too strong for the natural images which seldom match the low-rank condition. In this paper, we describe a flexible global low-rank restoration model which introduces the local statistical properties into the rank minimization. The proposed model can effectively recover the latent global low-rank structure via nuclear norm, as well as the fine details via Gaussian mixture model. An alternating scheme is developed to estimate the Gaussian parameters and the restored image, and it shows excellent convergence and stability. Besides, experiments on image and video sequence datasets show the effectiveness of the proposed method in image inpainting problems.
Algebraic and computational aspects of real tensor ranks
Sakata, Toshio; Miyazaki, Mitsuhiro
2016-01-01
This book provides comprehensive summaries of theoretical (algebraic) and computational aspects of tensor ranks, maximal ranks, and typical ranks, over the real number field. Although tensor ranks have been often argued in the complex number field, it should be emphasized that this book treats real tensor ranks, which have direct applications in statistics. The book provides several interesting ideas, including determinant polynomials, determinantal ideals, absolutely nonsingular tensors, absolutely full column rank tensors, and their connection to bilinear maps and Hurwitz-Radon numbers. In addition to reviews of methods to determine real tensor ranks in details, global theories such as the Jacobian method are also reviewed in details. The book includes as well an accessible and comprehensive introduction of mathematical backgrounds, with basics of positive polynomials and calculations by using the Groebner basis. Furthermore, this book provides insights into numerical methods of finding tensor ranks through...
DEFF Research Database (Denmark)
Frandsen, Gudmund Skovbjerg; Frandsen, Peter Frands
2009-01-01
We consider maintaining information about the rank of a matrix under changes of the entries. For n×n matrices, we show an upper bound of O(n1.575) arithmetic operations and a lower bound of Ω(n) arithmetic operations per element change. The upper bound is valid when changing up to O(n0.575) entries...... in a single column of the matrix. We also give an algorithm that maintains the rank using O(n2) arithmetic operations per rank one update. These bounds appear to be the first nontrivial bounds for the problem. The upper bounds are valid for arbitrary fields, whereas the lower bound is valid for algebraically...... closed fields. The upper bound for element updates uses fast rectangular matrix multiplication, and the lower bound involves further development of an earlier technique for proving lower bounds for dynamic computation of rational functions....
An Introduction to Applied Multivariate Analysis
Raykov, Tenko
2008-01-01
Focuses on the core multivariate statistics topics which are of fundamental relevance for its understanding. This book emphasis on the topics that are critical to those in the behavioral, social, and educational sciences.
Diversifying customer review rankings.
Krestel, Ralf; Dokoohaki, Nima
2015-06-01
E-commerce Web sites owe much of their popularity to consumer reviews accompanying product descriptions. On-line customers spend hours and hours going through heaps of textual reviews to decide which products to buy. At the same time, each popular product has thousands of user-generated reviews, making it impossible for a buyer to read everything. Current approaches to display reviews to users or recommend an individual review for a product are based on the recency or helpfulness of each review. In this paper, we present a framework to rank product reviews by optimizing the coverage of the ranking with respect to sentiment or aspects, or by summarizing all reviews with the top-K reviews in the ranking. To accomplish this, we make use of the assigned star rating for a product as an indicator for a review's sentiment polarity and compare bag-of-words (language model) with topic models (latent Dirichlet allocation) as a mean to represent aspects. Our evaluation on manually annotated review data from a commercial review Web site demonstrates the effectiveness of our approach, outperforming plain recency ranking by 30% and obtaining best results by combining language and topic model representations. Copyright © 2015 Elsevier Ltd. All rights reserved.
DEFF Research Database (Denmark)
Müller, Emmanuel; Assent, Ira; Steinhausen, Uwe
2008-01-01
Outlier detection is an important data mining task for consistency checks, fraud detection, etc. Binary decision making on whether or not an object is an outlier is not appropriate in many applications and moreover hard to parametrize. Thus, recently, methods for outlier ranking have been proposed...
Multivariable Burchnall-Chaundy theory.
Previato, Emma
2008-03-28
Burchnall & Chaundy (Burchnall & Chaundy 1928 Proc. R. Soc. A 118, 557-583) classified the (rank 1) commutative subalgebras of the algebra of ordinary differential operators. To date, there is no such result for several variables. This paper presents the problem and the current state of the knowledge, together with an interpretation in differential Galois theory. It is known that the spectral variety of a multivariable commutative ring will not be associated to a KP-type hierarchy of deformations, but examples of related integrable equations were produced and are reviewed. Moreover, such an algebro-geometric interpretation is made to fit into A.N. Parshin's newer theory of commuting rings of partial pseudodifferential operators and KP-type hierarchies which uses higher local fields.
Ranking Workplace Competencies: Student and Graduate Perceptions.
Rainsbury, Elizabeth; Hodges, Dave; Burchell, Noel; Lay, Mark
2002-01-01
New Zealand business students and graduates made similar rankings of the five most important workplace competencies: computer literacy, customer service orientation, teamwork and cooperation, self-confidence, and willingness to learn. Graduates placed greater importance on most of the 24 competencies, resulting in a statistically significant…
Improving Ranking Using Quantum Probability
Melucci, Massimo
2011-01-01
The paper shows that ranking information units by quantum probability differs from ranking them by classical probability provided the same data used for parameter estimation. As probability of detection (also known as recall or power) and probability of false alarm (also known as fallout or size) measure the quality of ranking, we point out and show that ranking by quantum probability yields higher probability of detection than ranking by classical probability provided a given probability of ...
Switching Between Multivariable Controllers
DEFF Research Database (Denmark)
Niemann, Hans Henrik; Stoustrup, Jakob; Abrahamsen, Rune
2004-01-01
it is possible to smoothly switch between multivariable controllers with guaranteed closed-loop stability. This includes also the case where one or more controllers are unstable. The concept for smooth online changes of multivariable controllers based on the YJBK architecture can also handle the start up......A concept for implementation of multivariable controllers is presented in this paper. The concept is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization of all stabilizing controllers. By using this architecture for implementation of multivariable controllers, it is shown how...... and shut down of multivariable systems. Furthermore, the start up of unstable multivariable controllers can be handled as well. Finally, implementation of (unstable) controllers as a stable Q parameter in a Q-parameterized controller can also be achieved....
On multivariate control charts
Frisén, Marianne
2011-01-01
Industrial production requires multivariate control charts to enable monitoring of several components. Recently there has been an increased interest also in other areas such as detection of bioterrorism, spatial surveillance and transaction strategies in finance. In the literature, several types of multivariate counterparts to the univariate Shewhart, EWMA and CUSUM methods have been proposed. We review general approaches to multivariate control chart. Suggestions are made on the special chal...
Fractional cointegration rank estimation
DEFF Research Database (Denmark)
Lasak, Katarzyna; Velasco, Carlos
We consider cointegration rank estimation for a p-dimensional Fractional Vector Error Correction Model. We propose a new two-step procedure which allows testing for further long-run equilibrium relations with possibly different persistence levels. The fi…rst step consists in estimating the parame......We consider cointegration rank estimation for a p-dimensional Fractional Vector Error Correction Model. We propose a new two-step procedure which allows testing for further long-run equilibrium relations with possibly different persistence levels. The fi…rst step consists in estimating...... to control for stochastic trend estimation effects from the first step. The critical values of the tests proposed depend only on the number of common trends under the null, p - r, and on the interval of the cointegration degrees b allowed, but not on the true cointegration degree b0. Hence, no additional...
DEFF Research Database (Denmark)
Silvennoinen, Annastiina; Teräsvirta, Timo
This article contains a review of multivariate GARCH models. Most common GARCH models are presented and their properties considered. This also includes nonparametric and semiparametric models. Existing specification and misspecification tests are discussed. Finally, there is an empirical example...... in which several multivariate GARCH models are fitted to the same data set and the results compared....
Multivariate piecewise polynomials
de Boor, C.
This article was supposed to be on `multivariate splines". An informal survey, taken recently by asking various people in Approximation Theory what they consider to be a `multivariate spline', resulted in the answer that a multivariate spline is a possibly smooth piecewise polynomial function of several arguments. In particular the potentially very useful thin-plate spline was thought to belong more to the subject of radial basis funtions than in the present article. This is all the more surprising to me since I am convinced that the variational approach to splines will play a much greater role in multivariate spline theory than it did or should have in the univariate theory. Still, as there is more than enough material for a survey of multivariate piecewise polynomials, this article is restricted to this topic, as is indicated by the (changed) title.
Switching Between Multivariable Controllers
DEFF Research Database (Denmark)
Niemann, H.; Stoustrup, Jakob; Abrahamsen, R.B.
2004-01-01
A concept for implementation of multivariable controllers is presented in this paper. The concept is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization of all stabilizing controllers. By using this architecture for implementation of multivariable controllers, it is shown how it is p...... it is possible to smoothly switch between multivariable controllers with guaranteed closed-loop stability. This includes also the case where one or more controllers are unstable. Udgivelsesdato: MAR-APR......A concept for implementation of multivariable controllers is presented in this paper. The concept is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization of all stabilizing controllers. By using this architecture for implementation of multivariable controllers, it is shown how...
Directory of Open Access Journals (Sweden)
Ana Paula Almeida Bertossi
2013-10-01
Full Text Available Multivariate statistics techniques (Principal Component Analysis and Cluster Analysis were employed to select the most important parameters that explain water quality variability at a rural watershed in the state of Espírito Santo (Brazil. In addition to group the waters studied for the similarity of features selected to verify the effect of type of soil cover (agriculture, livestock, forest and urban, water resource (surface and underground and sampling period (rainy and dry seasons. Nineteen physico-chemical parameters of water quality were analyzed: pH, electrical conductivity, total solids, total dissolved solids, total suspended solids, turbidity, biochemical oxygen demand (BOD, ammoniacal nitrogen, nitrate, nitrite, total phosphorous, Ca, Mg, Fe, Na, K, Zn, Cu and total coliform. Application of Principal Component Analysis reduced the 19 parameters to three components that explained 87.53% of the total variance of data set. Water quality parameters that best explained variability of data were: electrical conductivity, total solids, total dissolved solids, turbidity, BOD, nitrate, Ca, Mg, and Na. Application of Cluster Analysis showed four different groups of water quality that differed in concentration of physicochemical characteristics and the type of water resource study, since the collection periods and the type of soil cover did not influence the segregation of groups formed. No presente trabalho empregaram-se técnicas de Estatística Multivariada (Análise de Componentes Principais e Análise de Agrupamento Hierárquico com o objetivo de selecionar as características físico-químicas mais importantes para explicar a variabilidade da qualidade das águas de uma sub-bacia hidrográfica rural no Sul do Estado do Espírito Santo, além de agrupar as águas estudadas quanto à similaridade das características selecionadas para verificar o efeito do tipo de cobertura do solo (agrícola, pecuário, florestal e urbano, de recurso h
Multivariate approaches in plant science
DEFF Research Database (Denmark)
Gottlieb, D.M.; Schultz, j.; Bruun, Susanne Wrang
2004-01-01
The objective of proteomics is to get an overview of the proteins expressed at a given point in time in a given tissue and to identify the connection to the biochemical status of that tissue. Therefore sample throughput and analysis time are important issues in proteomics. The concept of proteomics...... generating protein analysis methods like mass spectrometry and near infrared spectroscopy and examples of application to these techniques are also presented. Multivariate data analysis can unravel complicated data structures and may thereby relieve the characterization phase in classical proteomics....... Traditionally statistical methods are not suitable for analysis of the huge amounts of data, where the number of variables exceed the number of objects. Multivariate data analysis, on the other hand, may uncover the hidden structures present in these data. This study takes its starting point in the field...
Co-integration Rank Testing under Conditional Heteroskedasticity
DEFF Research Database (Denmark)
Cavaliere, Guiseppe; Rahbæk, Anders; Taylor, A.M. Robert
null distributions of the rank statistics coincide with those derived by previous authors who assume either i.i.d. or (strict and covariance) stationary martingale difference innovations. We then propose wild bootstrap implementations of the co-integrating rank tests and demonstrate that the associated...
Can College Rankings Be Believed?
Directory of Open Access Journals (Sweden)
Meredith Davis
Full Text Available The article summarizes literature on college and university rankings worldwide and the strategies used by various ranking organizations, including those of government and popular media. It traces the history of national and global rankings, indicators used by ranking systems, and the effect of rankings on academic programs and their institutions. Although ranking systems employ diverse criteria and most weight certain indicators over others, there is considerable skepticism that most actually measure educational quality. At the same time, students and their families increasingly consult these evaluations when making college decisions, and sponsors of faculty research consider reputation when forming academic partnerships. While there are serious concerns regarding the validity of ranking institutions when so little data can support differences between one institution and another, college rankings appear to be here to stay.
Methods of Multivariate Analysis
Rencher, Alvin C
2012-01-01
Praise for the Second Edition "This book is a systematic, well-written, well-organized text on multivariate analysis packed with intuition and insight . . . There is much practical wisdom in this book that is hard to find elsewhere."-IIE Transactions Filled with new and timely content, Methods of Multivariate Analysis, Third Edition provides examples and exercises based on more than sixty real data sets from a wide variety of scientific fields. It takes a "methods" approach to the subject, placing an emphasis on how students and practitioners can employ multivariate analysis in real-life sit
Adaptive distributional extensions to DFR ranking
DEFF Research Database (Denmark)
Petersen, Casper; Simonsen, Jakob Grue; Järvelin, Kalervo
2016-01-01
Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models....... An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor...... separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best...
Ranking Baltic States Researchers
Directory of Open Access Journals (Sweden)
Gyula Mester
2017-10-01
Full Text Available In this article, using the h-index and the total number of citations, the best 10 Lithuanian, Latvian and Estonian researchers from several disciplines are ranked. The list may be formed based on the h-index and the total number of citations, given in Web of Science, Scopus, Publish or Perish Program and Google Scholar database. Data for the first 10 researchers are presented. Google Scholar is the most complete. Therefore, to define a single indicator, h-index calculated by Google Scholar may be a good and simple one. The author chooses the Google Scholar database as it is the broadest one.
2015-04-28
eigenvector of the associated Laplacian matrix (i.e., the Fiedler vector) matches that of the variables. In other words, this approach (reminiscent of...S1), i.e., Dii = ∑n j=1Gi,j is the degree of node i in the measurement graph G. 3: Compute the Fiedler vector of S (eigenvector corresponding to the...smallest nonzero eigenvalue of LS). 4: Output the ranking induced by sorting the Fiedler vector of S, with the global ordering (increasing or decreasing
Rankings from Fuzzy Pairwise Comparisons
van den Broek, P.M.; Noppen, J.A.R.; Mohammadian, M.
2006-01-01
We propose a new method for deriving rankings from fuzzy pairwise comparisons. It is based on the observation that quantification of the uncertainty of the pairwise comparisons should be used to obtain a better crisp ranking, instead of a fuzzified version of the ranking obtained from crisp pairwise
University Rankings and Social Science
Marginson, Simon
2014-01-01
University rankings widely affect the behaviours of prospective students and their families, university executive leaders, academic faculty, governments and investors in higher education. Yet the social science foundations of global rankings receive little scrutiny. Rankings that simply recycle reputation without any necessary connection to real…
Multivariate Time Series Search
National Aeronautics and Space Administration — Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical...
Robert M. Kunst
1986-01-01
abstract: this paper presents an extension of an idea by kleiner, martin & thomson (1979) to multivariate autoregressive processes. the properties of the procedures are reported by some examples with economic data. starting from the assumption that observations obey an autoregressive vector process but are contaminated by additive disturbances, it is endeavoured to eliminate the disturbances to regain the "true" process and its law of generation. this is done iteratively by the multivariate r...
Global network centrality of university rankings
Guo, Weisi; Del Vecchio, Marco; Pogrebna, Ganna
2017-10-01
Universities and higher education institutions form an integral part of the national infrastructure and prestige. As academic research benefits increasingly from international exchange and cooperation, many universities have increased investment in improving and enabling their global connectivity. Yet, the relationship of university performance and its global physical connectedness has not been explored in detail. We conduct, to our knowledge, the first large-scale data-driven analysis into whether there is a correlation between university relative ranking performance and its global connectivity via the air transport network. The results show that local access to global hubs (as measured by air transport network betweenness) strongly and positively correlates with the ranking growth (statistical significance in different models ranges between 5% and 1% level). We also found that the local airport's aggregate flight paths (degree) and capacity (weighted degree) has no effect on university ranking, further showing that global connectivity distance is more important than the capacity of flight connections. We also examined the effect of local city economic development as a confounding variable and no effect was observed suggesting that access to global transportation hubs outweighs economic performance as a determinant of university ranking. The impact of this research is that we have determined the importance of the centrality of global connectivity and, hence, established initial evidence for further exploring potential connections between university ranking and regional investment policies on improving global connectivity.
A Cognitive Model for Aggregating People's Rankings
Lee, Michael D.; Steyvers, Mark; Miller, Brent
2014-01-01
We develop a cognitive modeling approach, motivated by classic theories of knowledge representation and judgment from psychology, for combining people's rankings of items. The model makes simple assumptions about how individual differences in knowledge lead to observed ranking data in behavioral tasks. We implement the cognitive model as a Bayesian graphical model, and use computational sampling to infer an aggregate ranking and measures of the individual expertise. Applications of the model to 23 data sets, dealing with general knowledge and prediction tasks, show that the model performs well in producing an aggregate ranking that is often close to the ground truth and, as in the “wisdom of the crowd” effect, usually performs better than most of individuals. We also present some evidence that the model outperforms the traditional statistical Borda count method, and that the model is able to infer people's relative expertise surprisingly well without knowing the ground truth. We discuss the advantages of the cognitive modeling approach to combining ranking data, and in wisdom of the crowd research generally, as well as highlighting a number of potential directions for future model development. PMID:24816733
Essentials of multivariate data analysis
Spencer, Neil H
2013-01-01
""… this text provides an overview at an introductory level of several methods in multivariate data analysis. It contains in-depth examples from one data set woven throughout the text, and a free [Excel] Add-In to perform the analyses in Excel, with step-by-step instructions provided for each technique. … could be used as a text (possibly supplemental) for courses in other fields where researchers wish to apply these methods without delving too deeply into the underlying statistics.""-The American Statistician, February 2015
Multivariate meta-analysis: Potential and promise
Jackson, Dan; Riley, Richard; White, Ian R
2011-01-01
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052
Serdobolskii, Vadim Ivanovich
2007-01-01
This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Ranking nodes in growing networks: When PageRank fails.
Mariani, Manuel Sebastian; Medo, Matúš; Zhang, Yi-Cheng
2015-11-10
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
Statistical analysis of management data
Gatignon, Hubert
2013-01-01
This book offers a comprehensive approach to multivariate statistical analyses. It provides theoretical knowledge of the concepts underlying the most important multivariate techniques and an overview of actual applications.
Neophilia Ranking of Scientific Journals
Packalen, Mikko; Bhattacharya, Jay
2017-01-01
The ranking of scientific journals is important because of the signal it sends to scientists about what is considered most vital for scientific progress. Existing ranking systems focus on measuring the influence of a scientific paper (citations)—these rankings do not reward journals for publishing innovative work that builds on new ideas. We propose an alternative ranking based on the proclivity of journals to publish papers that build on new ideas, and we implement this ranking via a text-based analysis of all published biomedical papers dating back to 1946. In addition, we compare our neophilia ranking to citation-based (impact factor) rankings; this comparison shows that the two ranking approaches are distinct. Prior theoretical work suggests an active role for our neophilia index in science policy. Absent an explicit incentive to pursue novel science, scientists underinvest in innovative work because of a coordination problem: for work on a new idea to flourish, many scientists must decide to adopt it in their work. Rankings that are based purely on influence thus do not provide sufficient incentives for publishing innovative work. By contrast, adoption of the neophilia index as part of journal-ranking procedures by funding agencies and university administrators would provide an explicit incentive for journals to publish innovative work and thus help solve the coordination problem by increasing scientists' incentives to pursue innovative work. PMID:28713181
Directory of Open Access Journals (Sweden)
Laura Mercatali
2013-05-01
Full Text Available Patients with solid cancer frequently develop bone metastases (BM. Zoledronic acid (Zometa®, ZA, routinely used to treat patients with BM, acts on osteoclasts and also has antitumor properties. We aimed to assess the effect of ZA over time in novel bone turnover markers (RANK/receptor activator of nuclear factor-k B ligand (RANK-L/ Osteoprotegerin (OPG and to correlate these with serum N-terminal telopeptide (NTX. The study prospectively evaluated levels of RANK, RANK-L and OPG transcripts by real-time PCR and NTX expression by ELISA in the peripheral blood of 49 consecutive patients with advanced breast, lung or prostate cancer. All patients received the standard ZA schedule and were monitored for 12 months. Median baseline values of RANK, RANK-L and OPG were 78.28 (range 7.34–620.64, 319.06 (21.42–1884.41 and 1.52 (0.10–58.02, respectively. At 12 months, the median RANK-L value had decreased by 22% with respect to the baseline, whereas median OPG levels had increased by about 96%. Consequently, the RANK-L/OPG ratio decreased by 56% from the baseline. Median serum NTX levels decreased over the 12-month period, reaching statistical significance (p < 0.0001. Our results would seem to indicate that ZA modulates RANK, RANK-L and OPG expression, thus decreasing osteoclast activity.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection
Chen, Lisha
2012-12-01
The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
Energy Technology Data Exchange (ETDEWEB)
Weber, G. F.; Laudal, D. L.
1989-01-01
This work is a compilation of reports on ongoing research at the University of North Dakota. Topics include: Control Technology and Coal Preparation Research (SO{sub x}/NO{sub x} control, waste management), Advanced Research and Technology Development (turbine combustion phenomena, combustion inorganic transformation, coal/char reactivity, liquefaction reactivity of low-rank coals, gasification ash and slag characterization, fine particulate emissions), Combustion Research (fluidized bed combustion, beneficiation of low-rank coals, combustion characterization of low-rank coal fuels, diesel utilization of low-rank coals), Liquefaction Research (low-rank coal direct liquefaction), and Gasification Research (hydrogen production from low-rank coals, advanced wastewater treatment, mild gasification, color and residual COD removal from Synfuel wastewaters, Great Plains Gasification Plant, gasifier optimization).
A MULTIVARIATE WEIBULL DISTRIBUTION
Directory of Open Access Journals (Sweden)
Cheng Lee
2010-07-01
Full Text Available A multivariate survival function of Weibull Distribution is developed by expanding the theorem by Lu and Bhattacharyya. From the survival function, the probability density function, the cumulative probability function, the determinant of the Jacobian Matrix, and the general moment are derived.
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole Eiler; Hansen, Peter Reinhard; Lunde, Asger
2011-01-01
We propose a multivariate realised kernel to estimate the ex-post covariation of log-prices. We show this new consistent estimator is guaranteed to be positive semi-definite and is robust to measurement error of certain types and can also handle non-synchronous trading. It is the first estimator...
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole; Hansen, Peter Reinhard; Lunde, Asger
We propose a multivariate realised kernel to estimate the ex-post covariation of log-prices. We show this new consistent estimator is guaranteed to be positive semi-definite and is robust to measurement noise of certain types and can also handle non-synchronous trading. It is the first estimator...
Statistical identification of effective input variables. [SCREEN
Energy Technology Data Exchange (ETDEWEB)
Vaurio, J.K.
1982-09-01
A statistical sensitivity analysis procedure has been developed for ranking the input data of large computer codes in the order of sensitivity-importance. The method is economical for large codes with many input variables, since it uses a relatively small number of computer runs. No prior judgemental elimination of input variables is needed. The sceening method is based on stagewise correlation and extensive regression analysis of output values calculated with selected input value combinations. The regression process deals with multivariate nonlinear functions, and statistical tests are also available for identifying input variables that contribute to threshold effects, i.e., discontinuities in the output variables. A computer code SCREEN has been developed for implementing the screening techniques. The efficiency has been demonstrated by several examples and applied to a fast reactor safety analysis code (Venus-II). However, the methods and the coding are general and not limited to such applications.
Ranking nodes in growing networks: When PageRank fails
Mariani, Manuel Sebastian; Medo, Matúš; Zhang, Yi-Cheng
2015-11-01
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm’s efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank’s performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
Rank diversity of languages: generic behavior in computational linguistics.
Cocho, Germinal; Flores, Jorge; Gershenson, Carlos; Pineda, Carlos; Sánchez, Sergio
2015-01-01
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: "heads" consist of words which almost do not change their rank in time, "bodies" are words of general use, while "tails" are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.
Srinivas, B; Kulick, S N; Doran, Christine; Kulick, Seth
1995-01-01
There are currently two philosophies for building grammars and parsers -- Statistically induced grammars and Wide-coverage grammars. One way to combine the strengths of both approaches is to have a wide-coverage grammar with a heuristic component which is domain independent but whose contribution is tuned to particular domains. In this paper, we discuss a three-stage approach to disambiguation in the context of a lexicalized grammar, using a variety of domain independent heuristic techniques. We present a training algorithm which uses hand-bracketed treebank parses to set the weights of these heuristics. We compare the performance of our grammar against the performance of the IBM statistical grammar, using both untrained and trained weights for the heuristics.
Skopina, Maria; Protasov, Vladimir
2016-01-01
This book presents a systematic study of multivariate wavelet frames with matrix dilation, in particular, orthogonal and bi-orthogonal bases, which are a special case of frames. Further, it provides algorithmic methods for the construction of dual and tight wavelet frames with a desirable approximation order, namely compactly supported wavelet frames, which are commonly required by engineers. It particularly focuses on methods of constructing them. Wavelet bases and frames are actively used in numerous applications such as audio and graphic signal processing, compression and transmission of information. They are especially useful in image recovery from incomplete observed data due to the redundancy of frame systems. The construction of multivariate wavelet frames, especially bases, with desirable properties remains a challenging problem as although a general scheme of construction is well known, its practical implementation in the multidimensional setting is difficult. Another important feature of wavelet is ...
University Rankings in Critical Perspective
Pusser, Brian; Marginson, Simon
2013-01-01
This article addresses global postsecondary ranking systems by using critical-theoretical perspectives on power. This research suggests rankings are at once a useful lens for studying power in higher education and an important instrument for the exercise of power in service of dominant norms in global higher education. (Contains 1 table and 1…
University Ranking as Social Exclusion
Amsler, Sarah S.; Bolsmann, Chris
2012-01-01
In this article we explore the dual role of global university rankings in the creation of a new, knowledge-identified, transnational capitalist class and in facilitating new forms of social exclusion. We examine how and why the practice of ranking universities has become widely defined by national and international organisations as an important…
Transient multivariable sensor evaluation
Energy Technology Data Exchange (ETDEWEB)
Vilim, Richard B.; Heifetz, Alexander
2017-02-21
A method and system for performing transient multivariable sensor evaluation. The method and system includes a computer system for identifying a model form, providing training measurement data, generating a basis vector, monitoring system data from sensor, loading the system data in a non-transient memory, performing an estimation to provide desired data and comparing the system data to the desired data and outputting an alarm for a defective sensor.
PageRank tracker: from ranking to tracking.
Gong, Chen; Fu, Keren; Loza, Artur; Wu, Qiang; Liu, Jia; Yang, Jie
2014-06-01
Video object tracking is widely used in many real-world applications, and it has been extensively studied for over two decades. However, tracking robustness is still an issue in most existing methods, due to the difficulties with adaptation to environmental or target changes. In order to improve adaptability, this paper formulates the tracking process as a ranking problem, and the PageRank algorithm, which is a well-known webpage ranking algorithm used by Google, is applied. Labeled and unlabeled samples in tracking application are analogous to query webpages and the webpages to be ranked, respectively. Therefore, determining the target is equivalent to finding the unlabeled sample that is the most associated with existing labeled set. We modify the conventional PageRank algorithm in three aspects for tracking application, including graph construction, PageRank vector acquisition and target filtering. Our simulations with the use of various challenging public-domain video sequences reveal that the proposed PageRank tracker outperforms mean-shift tracker, co-tracker, semiboosting and beyond semiboosting trackers in terms of accuracy, robustness and stability.
Rank Protein Immunolabeling during Bone-Implant Interface Healing Process
Directory of Open Access Journals (Sweden)
Francisley Ávila Souza
2010-01-01
Full Text Available The purpose of this paper was to evaluate the expression of RANK protein during bone-healing process around machined surface implants. Twenty male Wistar rats, 90 days old, after having had a 2 mm diameter and 6 mm long implant inserted in their right tibias, were evaluated at 7, 14, 21, and 42 days after healing. After obtaining the histological samples, slides were subjected to RANK immunostaining reaction. Results were quantitatively evaluated. Results. Immunolabeling analysis showed expressions of RANK in osteoclast and osteoblast lineage cells. The statistical analysis showed an increase in the expression of RANK in osteoblasts at 7 postoperative days and a gradual decrease during the chronology of the healing process demonstrated by mild cellular activity in the final stage (P<.05. Conclusion. RANK immunolabeling was observed especially in osteoclast and osteoblast cells in primary bone during the initial periods of bone-healing/implant interface.
Rank Protein Immunolabeling during Bone-Implant Interface Healing Process
Ávila Souza, Francisley; Pereira Queiroz, Thallita; Rodrigues Luvizuto, Eloá; Nishioka, Renato Sussumu; Garcia-JR, Idelmo Rangel; de Carvalho, Paulo Sérgio Perri; Okamoto, Roberta
2010-01-01
The purpose of this paper was to evaluate the expression of RANK protein during bone-healing process around machined surface implants. Twenty male Wistar rats, 90 days old, after having had a 2 mm diameter and 6 mm long implant inserted in their right tibias, were evaluated at 7, 14, 21, and 42 days after healing. After obtaining the histological samples, slides were subjected to RANK immunostaining reaction. Results were quantitatively evaluated. Results. Immunolabeling analysis showed expressions of RANK in osteoclast and osteoblast lineage cells. The statistical analysis showed an increase in the expression of RANK in osteoblasts at 7 postoperative days and a gradual decrease during the chronology of the healing process demonstrated by mild cellular activity in the final stage (P < .05). Conclusion. RANK immunolabeling was observed especially in osteoclast and osteoblast cells in primary bone during the initial periods of bone-healing/implant interface. PMID:20706673
Beyond Zipf's Law: The Lavalette Rank Function and its Properties
Fontanelli, Oscar; Yang, Yaning; Cocho, Germinal; Li, Wentian
2016-01-01
Although Zipf's law is widespread in natural and social data, one often encounters situations where one or both ends of the ranked data deviate from the power-law function. Previously we proposed the Beta rank function to improve the fitting of data which does not follow a perfect Zipf's law. Here we show that when the two parameters in the Beta rank function have the same value, the Lavalette rank function, the probability density function can be derived analytically. We also show both computationally and analytically that Lavalette distribution is approximately equal, though not identical, to the lognormal distribution. We illustrate the utility of Lavalette rank function in several datasets. We also address three analysis issues on the statistical testing of Lavalette fitting function, comparison between Zipf's law and lognormal distribution through Lavalette function, and comparison between lognormal distribution and Lavalette distribution.
Simonoska Crcarevska, Maja; Dimitrovska, Aneta; Sibinovska, Nadica; Mladenovska, Kristina; Slavevska Raicki, Renata; Glavas Dodov, Marija
2015-07-15
Microsponges drug delivery system (MDDC) was prepared by double emulsion-solvent-diffusion technique using rotor-stator homogenization. Quality by design (QbD) concept was implemented for the development of MDDC with potential to be incorporated into semisolid dosage form (gel). Quality target product profile (QTPP) and critical quality attributes (CQA) were defined and identified, accordingly. Critical material attributes (CMA) and Critical process parameters (CPP) were identified using quality risk management (QRM) tool, failure mode, effects and criticality analysis (FMECA). CMA and CPP were identified based on results obtained from principal component analysis (PCA-X&Y) and partial least squares (PLS) statistical analysis along with literature data, product and process knowledge and understanding. FMECA identified amount of ethylcellulose, chitosan, acetone, dichloromethane, span 80, tween 80 and water ratio in primary/multiple emulsions as CMA and rotation speed and stirrer type used for organic solvent removal as CPP. The relationship between identified CPP and particle size as CQA was described in the design space using design of experiments - one-factor response surface method. Obtained results from statistically designed experiments enabled establishment of mathematical models and equations that were used for detailed characterization of influence of identified CPP upon MDDC particle size and particle size distribution and their subsequent optimization. Copyright © 2015 Elsevier B.V. All rights reserved.
Universal scaling in sports ranking
Deng, Weibing; Li, Wei; Cai, Xu; Bulou, Alain; Wang, Qiuping A.
2012-09-01
Ranking is a ubiquitous phenomenon in human society. On the web pages of Forbes, one may find all kinds of rankings, such as the world's most powerful people, the world's richest people, the highest-earning tennis players, and so on and so forth. Herewith, we study a specific kind—sports ranking systems in which players' scores and/or prize money are accrued based on their performances in different matches. By investigating 40 data samples which span 12 different sports, we find that the distributions of scores and/or prize money follow universal power laws, with exponents nearly identical for most sports. In order to understand the origin of this universal scaling we focus on the tennis ranking systems. By checking the data we find that, for any pair of players, the probability that the higher-ranked player tops the lower-ranked opponent is proportional to the rank difference between the pair. Such a dependence can be well fitted to a sigmoidal function. By using this feature, we propose a simple toy model which can simulate the competition of players in different matches. The simulations yield results consistent with the empirical findings. Extensive simulation studies indicate that the model is quite robust with respect to the modifications of some parameters.
Universal scaling in sports ranking
Deng, Weibing; Cai, Xu; Bulou, Alain; Wang, Qiuping A
2011-01-01
Ranking is a ubiquitous phenomenon in the human society. By clicking the web pages of Forbes, you may find all kinds of rankings, such as world's most powerful people, world's richest people, top-paid tennis stars, and so on and so forth. Herewith, we study a specific kind, sports ranking systems in which players' scores and prize money are calculated based on their performances in attending various tournaments. A typical example is tennis. It is found that the distributions of both scores and prize money follow universal power laws, with exponents nearly identical for most sports fields. In order to understand the origin of this universal scaling we focus on the tennis ranking systems. By checking the data we find that, for any pair of players, the probability that the higher-ranked player will top the lower-ranked opponent is proportional to the rank difference between the pair. Such a dependence can be well fitted to a sigmoidal function. By using this feature, we propose a simple toy model which can simul...
Control Multivariable por Desacoplo
Directory of Open Access Journals (Sweden)
Fernando Morilla
2013-01-01
Full Text Available Resumen: La interacción entre variables es una característica inherente de los procesos multivariables, que dificulta su operación y el diseño de sus sistemas de control. Bajo el paradigma de Control por desacoplo se agrupan un conjunto de metodologías, que tradicionalmente han estado orientadas a eliminar o reducir la interacción, y que recientemente algunos investigadores han reorientado con objetivos de solucionar un problema tan complejo como es el control multivariable. Parte del material descrito en este artículo es bien conocido en el campo del control de procesos, pero la mayor parte de él son resultados de varios años de investigación de los autores en los que han primado la generalización del problema, la búsqueda de soluciones de fácil implementación y la combinación de bloques elementales de control PID. Esta conjunción de intereses provoca que no siempre se pueda conseguir un desacoplo perfecto, pero que sí se pueda conseguir una considerable reducción de la interacción en el nivel básico de la pirámide de control, en beneficio de otros sistemas de control que ocupan niveles jerárquicos superiores. El artículo resume todos los aspectos básicos del Control por desacoplo y su aplicación a dos procesos representativos: una planta experimental de cuatro tanques acoplados y un modelo 4×4 de un sistema experimental de calefacción, ventilación y aire acondicionado. Abstract: The interaction between variables is inherent in multivariable processes and this fact may complicate their operation and control system design. Under the paradigm of decoupling control, several methodologies that traditionally have been addressed to cancel or reduce the interactions are gathered. Recently, this approach has been reoriented by several researchers with the aim to solve such a complex problem as the multivariable control. Parts of the material in this work are well known in the process control field; however, most of them are
Effect of Doximity Residency Rankings on Residency Applicants’ Program Choices
Directory of Open Access Journals (Sweden)
Aimee M. Rolston
2015-11-01
Full Text Available Introduction: Choosing a residency program is a stressful and important decision. Doximity released residency program rankings by specialty in September 2014. This study sought to investigate the impact of those rankings on residency application choices made by fourth year medical students. Methods: A 12-item survey was administered in October 2014 to fourth year medical students at three schools. Students indicated their specialty, awareness of and perceived accuracy of the rankings, and the rankings’ impact on the programs to which they chose to apply. Descriptive statistics were reported for all students and those applying to Emergency Medicine (EM. Results: A total of 461 (75.8% students responded, with 425 applying in one of the 20 Doximity ranked specialties. Of the 425, 247 (58% were aware of the rankings and 177 looked at them. On a 1-100 scale (100=very accurate, students reported a mean ranking accuracy rating of 56.7 (SD 20.3. Forty-five percent of students who looked at the rankings modified the number of programs to which they applied. The majority added programs. Of the 47 students applying to EM, 18 looked at the rankings and 33% changed their application list with most adding programs. Conclusion: The Doximity rankings had real effects on students applying to residencies as almost half of students who looked at the rankings modified their program list. Additionally, students found the rankings to be moderately accurate. Graduating students might benefit from emphasis on more objective characterization of programs to assess in light of their own interests and personal/career goals
Energy Technology Data Exchange (ETDEWEB)
Crawfis, R.A.
1996-03-01
This paper presents a new technique for representing multivalued data sets defined on an integer lattice. It extends the state-of-the-art in volume rendering to include nonhomogeneous volume representations. That is, volume rendering of materials with very fine detail (e.g. translucent granite) within a voxel. Multivariate volume rendering is achieved by introducing controlled amounts of noise within the volume representation. Varying the local amount of noise within the volume is used to represent a separate scalar variable. The technique can also be used in image synthesis to create more realistic clouds and fog.
Energy Technology Data Exchange (ETDEWEB)
Almazan T, M. G.; Jimenez R, M.; Monroy G, F.; Tenorio, D. [ININ, Carretera Mexico-Toluca s/n, 52750 Ocoyoacac, Estado de Mexico (Mexico); Rodriguez G, N. L. [Instituto Mexiquense de Cultura, Subdireccion de Restauracion y Conservacion, Hidalgo poniente No. 1013, 50080 Toluca, Estado de Mexico (Mexico)
2009-07-01
The elementary composition of archaeological ceramic fragments obtained during the explorations in San Miguel Ixtapan, Mexico State, was determined by the neutron activation analysis technique. The samples irradiation was realized in the research reactor TRIGA Mark III with a neutrons flow of 1centre dot10{sup 13}ncentre dotcm{sup -2}centre dots{sup -1}. The irradiation time was of 2 hours. Previous to the acquisition of the gamma rays spectrum the samples were allowed to decay from 12 to 14 days. The analyzed elements were: Nd, Ce, Lu, Eu, Yb, Pa(Th), Tb, La, Cr, Hf, Sc, Co, Fe, Cs, Rb. The statistical treatment of the data, consistent in the group analysis and the main components analysis allowed to identify three different origins of the archaeological ceramic, designated as: local, foreign and regional. (Author)
A Statistical Analysis of Cointegration for I(2) Variables
DEFF Research Database (Denmark)
Johansen, Søren
1995-01-01
be conducted using the ¿ sup2/sup distribution. It is shown to what extent inference on the cointegration ranks can be conducted using the tables already prepared for the analysis of cointegration of I(1) variables. New tables are needed for the test statistics to control the size of the tests. This paper......This paper discusses inference for I(2) variables in a VAR model. The estimation procedure suggested consists of two reduced rank regressions. The asymptotic distribution of the proposed estimators of the cointegrating coefficients is mixed Gaussian, which implies that asymptotic inference can...... contains a multivariate test for the existence of I(2) variables. This test is illustrated using a data set consisting of U.K. and foreign prices and interest rates as well as the exchange rate....
Frahm, K. M.; Chepelianskii, A. D.; Shepelyansky, D. L.
2012-10-01
We up a directed network tracing links from a given integer to its divisors and analyze the properties of the Google matrix of this network. The PageRank vector of this matrix is computed numerically and it is shown that its probability is approximately inversely proportional to the PageRank index thus being similar to the Zipf law and the dependence established for the World Wide Web. The spectrum of the Google matrix of integers is characterized by a large gap and a relatively small number of nonzero eigenvalues. A simple semi-analytical expression for the PageRank of integers is derived that allows us to find this vector for matrices of billion size. This network provides a new PageRank order of integers.
Ranking in evolving complex networks
Liao, Hao; Mariani, Manuel Sebastian; Medo, Matúš; Zhang, Yi-Cheng; Zhou, Ming-Yang
2017-05-01
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Many popular ranking algorithms (such as Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. At the same time, recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of network traffic, prediction of future links, and identification of significant nodes.
Sang, Huiyan
2011-12-01
This paper investigates the cross-correlations across multiple climate model errors. We build a Bayesian hierarchical model that accounts for the spatial dependence of individual models as well as cross-covariances across different climate models. Our method allows for a nonseparable and nonstationary cross-covariance structure. We also present a covariance approximation approach to facilitate the computation in the modeling and analysis of very large multivariate spatial data sets. The covariance approximation consists of two parts: a reduced-rank part to capture the large-scale spatial dependence, and a sparse covariance matrix to correct the small-scale dependence error induced by the reduced rank approximation. We pay special attention to the case that the second part of the approximation has a block-diagonal structure. Simulation results of model fitting and prediction show substantial improvement of the proposed approximation over the predictive process approximation and the independent blocks analysis. We then apply our computational approach to the joint statistical modeling of multiple climate model errors. © 2012 Institute of Mathematical Statistics.
RANK and RANK ligand expression in primary human osteosarcoma
Directory of Open Access Journals (Sweden)
Daniel Branstetter
2015-09-01
Our results demonstrate RANKL expression was observed in the tumor element in 68% of human OS using IHC. However, the staining intensity was relatively low and only 37% (29/79 of samples exhibited≥10% RANKL positive tumor cells. RANK expression was not observed in OS tumor cells. In contrast, RANK expression was clearly observed in other cells within OS samples, including the myeloid osteoclast precursor compartment, osteoclasts and in giant osteoclast cells. The intensity and frequency of RANKL and RANK staining in OS samples were substantially less than that observed in GCTB samples. The observation that RANKL is expressed in OS cells themselves suggests that these tumors may mediate an osteoclastic response, and anti-RANKL therapy may potentially be protective against bone pathologies in OS. However, the absence of RANK expression in primary human OS cells suggests that any autocrine RANKL/RANK signaling in human OS tumor cells is not operative, and anti-RANKL therapy would not directly affect the tumor.
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Ranking structures and Rank-Rank Correlations of Countries. The FIFA and UEFA cases
Ausloos, Marcel; Gadomski, Adam; Vitanov, Nikolay K
2014-01-01
Ranking of agents competing with each other in complex systems may lead to paradoxes according to the pre-chosen different measures. A discussion is presented on such rank-rank, similar or not, correlations based on the case of European countries ranked by UEFA and FIFA from different soccer competitions. The first question to be answered is whether an empirical and simple law is obtained for such (self-) organizations of complex sociological systems with such different measuring schemes. It is found that the power law form is not the best description contrary to many modern expectations. The stretched exponential is much more adequate. Moreover, it is found that the measuring rules lead to some inner structures, in both cases.
Ranking structures and rank-rank correlations of countries: The FIFA and UEFA cases
Ausloos, Marcel; Cloots, Rudi; Gadomski, Adam; Vitanov, Nikolay K.
2014-04-01
Ranking of agents competing with each other in complex systems may lead to paradoxes according to the pre-chosen different measures. A discussion is presented on such rank-rank, similar or not, correlations based on the case of European countries ranked by UEFA and FIFA from different soccer competitions. The first question to be answered is whether an empirical and simple law is obtained for such (self-) organizations of complex sociological systems with such different measuring schemes. It is found that the power law form is not the best description contrary to many modern expectations. The stretched exponential is much more adequate. Moreover, it is found that the measuring rules lead to some inner structures in both cases.
Simplicial band depth for multivariate functional data
López-Pintado, Sara
2014-03-05
We propose notions of simplicial band depth for multivariate functional data that extend the univariate functional band depth. The proposed simplicial band depths provide simple and natural criteria to measure the centrality of a trajectory within a sample of curves. Based on these depths, a sample of multivariate curves can be ordered from the center outward and order statistics can be defined. Properties of the proposed depths, such as invariance and consistency, can be established. A simulation study shows the robustness of this new definition of depth and the advantages of using a multivariate depth versus the marginal depths for detecting outliers. Real data examples from growth curves and signature data are used to illustrate the performance and usefulness of the proposed depths. © 2014 Springer-Verlag Berlin Heidelberg.
A MULTIVARIATE ANALYSIS OF CROATIAN COUNTIES ENTREPRENEURSHIP
Directory of Open Access Journals (Sweden)
Elza Jurun
2012-12-01
Full Text Available In the focus of this paper is a multivariate analysis of Croatian Counties entrepreneurship. Complete data base available by official statistic institutions at national and regional level is used. Modern econometric methodology starting from a comparative analysis via multiple regression to multivariate cluster analysis is carried out as well as the analysis of successful or inefficacious entrepreneurship measured by indicators of efficiency, profitability and productivity. Time horizons of the comparative analysis are in 2004 and 2010. Accelerators of socio-economic development - number of entrepreneur investors, investment in fixed assets and current assets ratio in multiple regression model are analytically filtered between twenty-six independent variables as variables of the dominant influence on GDP per capita in 2010 as dependent variable. Results of multivariate cluster analysis of twentyone Croatian Counties are interpreted also in the sense of three Croatian NUTS 2 regions according to European nomenclature of regional territorial division of Croatia.
Comparison of multivariate post-processing approaches
Lerch, Sebastian; Graeter, Maximiliane
2017-04-01
Over the past decade, statistical post-processing of ensemble forecasts has become routine in numerical weather prediction. However, critically important spatial, temporal and inter-variable dependencies are lost when univariate post-processing techniques are applied separately to multiple locations, forecast horizons or variables. Therefore, several approaches for restoring multivariate dependencies have been proposed in the literature. These techniques rely on parametric and empirical copulas to incorporate multivariate dependence structures estimated from past forecasts or observations. Examples include ensemble copula coupling, the Gaussian copula approach and the Schaake shuffle. We compare these state of the art approaches in a simulation setting that mimics how post-processing is done in practice and that allows for investigating the effect of various types of misspecification of the ensemble prediction system on the forecast performance of the multivariate post-processing methods.
University Ranking Systems; Criteria and Critiques
Saka, Yavuz; YAMAN, Süleyman
2011-01-01
The purpose of this paper is to explore international university ranking systems. As a compilation study this paper provides specific criteria that each ranking system uses and main critiques regarding these ranking systems. Since there are many ranking systems in this area of research, this study focused on only most cited and referred ranking systems. As there is no consensus in terms of the criteria that these systems use, this paper has no intention of identifying the best ranking system ...
Rankings matter: nurse graduates from higher-ranked institutions have higher productivity.
Yakusheva, Olga; Weiss, Marianne
2017-02-13
Increasing demand for baccalaureate-prepared nurses has led to rapid growth in the number of baccalaureate-granting programs, and to concerns about educational quality and potential effects on productivity of the graduating nursing workforce. We examined the association of individual productivity of a baccalaureate-prepared nurse with the ranking of the degree-granting institution. For a sample of 691 nurses from general medical-surgical units at a large magnet urban hospital between 6/1/2011-12/31/2011, we conducted multivariate regression analysis of nurse productivity on the ranking of the degree-granting institution, adjusted for age, hospital tenure, gender, and unit-specific effects. Nurse productivity was coded as "top"/"average"/"bottom" based on a computation of individual nurse value-added to patient outcomes. Ranking of the baccalaureate-granting institution was derived from the US News and World Report Best Colleges Rankings' categorization of the nurse's institution as the "first tier" or the "second tier", with diploma or associate degree as the reference category. Relative to diploma or associate degree nurses, nurses who had attended first-tier universities had three-times the odds of being in the top productivity category (OR = 3.18, p productivity (OR = 1.73, p = 0.11). Being in the bottom productivity category was not associated with having a baccalaureate degree or the quality tier. The productivity boost from a nursing baccalaureate degree depends on the quality of the educational institution. Recognizing differences in educational outcomes, initiatives to build a baccalaureate-educated nursing workforce should be accompanied by improved access to high-quality educational institutions.
A new test of multivariate nonlinear causality.
Bai, Zhidong; Hui, Yongchang; Jiang, Dandan; Lv, Zhihui; Wong, Wing-Keung; Zheng, Shurong
2018-01-01
The multivariate nonlinear Granger causality developed by Bai et al. (2010) (Mathematics and Computers in simulation. 2010; 81: 5-17) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994) (Journal of Finance. 1994; 49(5): 1639-1664), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate U-statistic. However, Bai et al. (2016) (2016; arXiv: 1701.03992) revisit the HJ test and find that the test statistic given by HJ is NOT a function of U-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power.
Toward optimal feature selection using ranking methods and classification algorithms
Directory of Open Access Journals (Sweden)
Novaković Jasmina
2011-01-01
Full Text Available We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.
Ranking species in mutualistic networks.
Domínguez-García, Virginia; Muñoz, Miguel A
2015-02-02
Understanding the architectural subtleties of ecological networks, believed to confer them enhanced stability and robustness, is a subject of outmost relevance. Mutualistic interactions have been profusely studied and their corresponding bipartite networks, such as plant-pollinator networks, have been reported to exhibit a characteristic "nested" structure. Assessing the importance of any given species in mutualistic networks is a key task when evaluating extinction risks and possible cascade effects. Inspired in a recently introduced algorithm--similar in spirit to Google's PageRank but with a built-in non-linearity--here we propose a method which--by exploiting their nested architecture--allows us to derive a sound ranking of species importance in mutualistic networks. This method clearly outperforms other existing ranking schemes and can become very useful for ecosystem management and biodiversity preservation, where decisions on what aspects of ecosystems to explicitly protect need to be made.
University rankings in computer science
DEFF Research Database (Denmark)
Ehret, Philip; Zuccala, Alesia Ann; Gipp, Bela
2017-01-01
This is a research-in-progress paper concerning two types of institutional rankings, the Leiden and QS World ranking, and their relationship to a list of universities’ ‘geo-based’ impact scores, and Computing Research and Education Conference (CORE) participation scores in the field of computer...... science. A ‘geo-based’ impact measure examines the geographical distribution of incoming citations to a particular university’s journal articles for a specific period of time. It takes into account both the number of citations and the geographical variability in these citations. The CORE participation...... score is calculated on the basis of the number of weighted proceedings papers that a university has contributed to either an A*, A, B, or C conference as ranked by the Computing Research and Education Association of Australasia. In addition to calculating the correlations between the distinct university...
Tucker Tensor analysis of Matern functions in spatial statistics
Litvinenko, Alexander
2017-11-18
In this work, we describe advanced numerical tools for working with multivariate functions and for the analysis of large data sets. These tools will drastically reduce the required computing time and the storage cost, and, therefore, will allow us to consider much larger data sets or finer meshes. Covariance matrices are crucial in spatio-temporal statistical tasks, but are often very expensive to compute and store, especially in 3D. Therefore, we approximate covariance functions by cheap surrogates in a low-rank tensor format. We apply the Tucker and canonical tensor decompositions to a family of Matern- and Slater-type functions with varying parameters and demonstrate numerically that their approximations exhibit exponentially fast convergence. We prove the exponential convergence of the Tucker and canonical approximations in tensor rank parameters. Several statistical operations are performed in this low-rank tensor format, including evaluating the conditional covariance matrix, spatially averaged estimation variance, computing a quadratic form, determinant, trace, loglikelihood, inverse, and Cholesky decomposition of a large covariance matrix. Low-rank tensor approximations reduce the computing and storage costs essentially. For example, the storage cost is reduced from an exponential O(n^d) to a linear scaling O(drn), where d is the spatial dimension, n is the number of mesh points in one direction, and r is the tensor rank. Prerequisites for applicability of the proposed techniques are the assumptions that the data, locations, and measurements lie on a tensor (axes-parallel) grid and that the covariance function depends on a distance, ||x-y||.
Power Estimation in Multivariate Analysis of Variance
Directory of Open Access Journals (Sweden)
Jean François Allaire
2007-09-01
Full Text Available Power is often overlooked in designing multivariate studies for the simple reason that it is believed to be too complicated. In this paper, it is shown that power estimation in multivariate analysis of variance (MANOVA can be approximated using a F distribution for the three popular statistics (Hotelling-Lawley trace, Pillai-Bartlett trace, Wilk`s likelihood ratio. Consequently, the same procedure, as in any statistical test, can be used: computation of the critical F value, computation of the noncentral parameter (as a function of the effect size and finally estimation of power using a noncentral F distribution. Various numerical examples are provided which help to understand and to apply the method. Problems related to post hoc power estimation are discussed.
Practical multivariate analysis
Afifi, Abdelmonem; Clark, Virginia A
2011-01-01
""First of all, it is very easy to read. … The authors manage to introduce and (at least partially) explain even quite complex concepts, e.g. eigenvalues, in an easy and pedagogical way that I suppose is attractive to readers without deeper statistical knowledge. The text is also sprinkled with references for those who want to probe deeper into a certain topic. Secondly, I personally find the book's emphasis on practical data handling very appealing. … Thirdly, the book gives very nice coverage of regression analysis. … this is a nicely written book that gives a good overview of a large number
Podium: Ranking Data Using Mixed-Initiative Visual Analytics.
Wall, Emily; Das, Subhajit; Chawla, Ravish; Kalidindi, Bharath; Brown, Eli T; Endert, Alex
2017-08-29
People often rank and order data points as a vital part of making decisions. Multi-attribute ranking systems are a common tool used to make these data-driven decisions. Such systems often take the form of a table-based visualization in which users assign weights to the attributes representing the quantifiable importance of each attribute to a decision, which the system then uses to compute a ranking of the data. However, these systems assume that users are able to quantify their conceptual understanding of how important particular attributes are to a decision. This is not always easy or even possible for users to do. Rather, people often have a more holistic understanding of the data. They form opinions that data point A is better than data point B but do not necessarily know which attributes are important. To address these challenges, we present a visual analytic application to help people rank multi-variate data points. We developed a prototype system, Podium, that allows users to drag rows in the table to rank order data points based on their perception of the relative value of the data. Podium then infers a weighting model using Ranking SVM that satisfies the user's data preferences as closely as possible. Whereas past systems help users understand the relationships between data points based on changes to attribute weights, our approach helps users to understand the attributes that might inform their understanding of the data. We present two usage scenarios to describe some of the potential uses of our proposed technique: (1) understanding which attributes contribute to a user's subjective preferences for data, and (2) deconstructing attributes of importance for existing rankings.Our proposed approach makes powerful machine learning techniques more usable to those who may not have expertise in these areas.
THE USE OF RANKING SAMPLING METHOD WITHIN MARKETING RESEARCH
Directory of Open Access Journals (Sweden)
CODRUŢA DURA
2011-01-01
Full Text Available Marketing and statistical literature available to practitioners provides a wide range of sampling methods that can be implemented in the context of marketing research. Ranking sampling method is based on taking apart the general population into several strata, namely into several subdivisions which are relatively homogenous regarding a certain characteristic. In fact, the sample will be composed by selecting, from each stratum, a certain number of components (which can be proportional or non-proportional to the size of the stratum until the pre-established volume of the sample is reached. Using ranking sampling within marketing research requires the determination of some relevant statistical indicators - average, dispersion, sampling error etc. To that end, the paper contains a case study which illustrates the actual approach used in order to apply the ranking sample method within a marketing research made by a company which provides Internet connection services, on a particular category of customers – small and medium enterprises.
Let Us Rank Journalism Programs
Weber, Joseph
2014-01-01
Unlike law, business, and medical schools, as well as universities in general, journalism schools and journalism programs have rarely been ranked. Publishers such as "U.S. News & World Report," "Forbes," "Bloomberg Businessweek," and "Washington Monthly" do not pay them much mind. What is the best…
Some applications of multivariate statistics to physical anthropology
van Vark, GN
This paper presents some of the results of the cooperation between the author, a physical anthropologist, and Willem Schaafsma. The subjects of study to be discussed in this paper all refer to human evolution, in particular to the process of hominisation. It is described how the interest of the
Multivariate Statistical Process Optimization in the Industrial Production of Enzymes
DEFF Research Database (Denmark)
Klimkiewicz, Anna
ultrafiltration operation is limited by the membrane fouling phenomenawhere the production capacity - monitored as flow through the membrane or flux -decreases over time. The flux varies considerably from run to run within the sameproduct and likewise between different products. This variability clearly affects...... that the process can be modeled sufficiently well when the datasets areconcatenated variable-wise. The later studies used this type of data arrangement andfocused only on the products with higher concentration degree as in those cases theflux decline problem has been the most pronounced. Blocking in the row...... of productyield. The potential of NIR technology to monitor the activity of the enzyme has beenthe subject of a feasibility study presented in PAPER I. It included (a) evaluation onwhich of the two real-time NIR flow cell configurations is the preferred arrangementfor monitoring of the retentate stream downstream...
Selection for Yield Improvement Using of Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
H Sabouri
2012-06-01
Full Text Available In order to providing selection indices using of heritability and correlation of effective traits on grain yield and multiple regression an experiment was conducted by 265 F3 families as well as parents and F1 related to Gharib × Khazar population in 2009 at Gonbad Kavous University fields. Days to ripening (0.97 and panicle number and flag leaf length (0.66 had maximum and minimum heritability, respectively. Positive and significant correlations were detected between plant yield and flag leaf width (0.265**, plant height (0.193**, panicle number (0.734** and biomass (0.828**. Biomass, days to heading and plant height were explained about 98% of total variation of yield and inserted to model respectively. Different combination of phenotypic and genotypic correlations, genetic and phenotypic direct effect in path analysis and heritability with and without yield were used for construct selection vectors. According to this study, increasing of traits is not result of relative efficiency and other comparing parameters. Selection indices were showed that yield, significant genetic correlation with yield and high heritability are three important part of selection index.
Selection for Yield Improvement Using of Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
H Sabouri
2012-07-01
Full Text Available In order to providing selection indices using heritability and correlation effective traits on yield and multiple regression an experiment was conducted by 265 F3 families as well as parents and F1 related to Gharib × Khazar population in 2009 at Gonbad High Education Center fields. Days to repining (0.97 and panicle number and flag leaf length (0.66 had maximum and minimum heritability, respectively. Positive and significant correlations were detected between plant yield and flag leaf width (0.265**, plant height (0.193**, panicle number (0.734** and biomass (0.828**. Biomass, days to heading and plant height were explained about 98% of total variation of yield and inserted to model respectively. Phenotypic and genotypic correlations, genetic and phenotypic direct effect in path analysis, heritability were used for construct selection vectors. According to this study, increasing of traits is not result of relative efficiency and compares parameter. Selection indices were showed that yield, significant genetic correlation with yield and high heritability are three important part of selection index. Fifth, Sixth and fourteenth are the most important between discussed indices.
Using multivariate statistical analysis to assess changes in water ...
African Journals Online (AJOL)
Canonical correspondence analysis (CCA) showed that the environmental variables used in the analysis, discharge and month of sampling, explained a small proportion of the total variance in the data set – less than 10% at each site. However, the total data set variance, explained by the 4 hypothetical axes generated by ...
International Conference on Measurement and Multivariate Analysis
Baba, Yasumasa; Bozdogan, Hamparsum; Kanefuji, Koji; Measurement and Multivariate Analysis
2002-01-01
Diversity is characteristic of the information age and also of statistics. To date, the social sciences have contributed greatly to the development of handling data under the rubric of measurement, while the statistical sciences have made phenomenal advances in theory and algorithms. Measurement and Multivariate Analysis promotes an effective interplay between those two realms of research-diversity with unity. The union and the intersection of those two areas of interest are reflected in the papers in this book, drawn from an international conference in Banff, Canada, with participants from 15 countries. In five major categories - scaling, structural analysis, statistical inference, algorithms, and data analysis - readers will find a rich variety of topics of current interest in the extended statistical community.
Multiple graph regularized protein domain ranking
Directory of Open Access Journals (Sweden)
Wang Jim
2012-11-01
Full Text Available Abstract Background Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. Results To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. Conclusion The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
Multiple graph regularized protein domain ranking
Wang, Jim Jing-Yan
2012-11-19
Background: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
The Globalization of College and University Rankings
Altbach, Philip G.
2012-01-01
In the era of globalization, accountability, and benchmarking, university rankings have achieved a kind of iconic status. The major ones--the Academic Ranking of World Universities (ARWU, or the "Shanghai rankings"), the QS (Quacquarelli Symonds Limited) World University Rankings, and the "Times Higher Education" World…
Multivariate analysis techniques
Energy Technology Data Exchange (ETDEWEB)
Bendavid, Josh [European Organization for Nuclear Research (CERN), Geneva (Switzerland); Fisher, Wade C. [Michigan State Univ., East Lansing, MI (United States); Junk, Thomas R. [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
2016-01-01
The end products of experimental data analysis are designed to be simple and easy to understand: hypothesis tests and measurements of parameters. But, the experimental data themselves are voluminous and complex. Furthermore, in modern collider experiments, many petabytes of data must be processed in search of rare new processes which occur together with much more copious background processes that are of less interest to the task at hand. The systematic uncertainties on the background may be larger than the expected signal in many cases. The statistical power of an analysis and its sensitivity to systematic uncertainty can therefore usually both be improved by separating signal events from background events with higher efficiency and purity.
Data depth and rank-based tests for covariance and spectral density matrices
Chau, Joris
2017-06-26
In multivariate time series analysis, objects of primary interest to study cross-dependences in the time series are the autocovariance or spectral density matrices. Non-degenerate covariance and spectral density matrices are necessarily Hermitian and positive definite, and our primary goal is to develop new methods to analyze samples of such matrices. The main contribution of this paper is the generalization of the concept of statistical data depth for collections of covariance or spectral density matrices by exploiting the geometric properties of the space of Hermitian positive definite matrices as a Riemannian manifold. This allows one to naturally characterize most central or outlying matrices, but also provides a practical framework for rank-based hypothesis testing in the context of samples of covariance or spectral density matrices. First, the desired properties of a data depth function acting on the space of Hermitian positive definite matrices are presented. Second, we propose two computationally efficient pointwise and integrated data depth functions that satisfy each of these requirements. Several applications of the developed methodology are illustrated by the analysis of collections of spectral matrices in multivariate brain signal time series datasets.
Exact rational expectations, cointegration, and reduced rank regression
DEFF Research Database (Denmark)
Johansen, Søren; Swensen, Anders Rygh
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact rational expectations, cointegration, and reduced rank regression
DEFF Research Database (Denmark)
Johansen, Søren; Swensen, Anders Rygh
2008-01-01
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact Rational Expectations, Cointegration, and Reduced Rank Regression
DEFF Research Database (Denmark)
Johansen, Søren; Swensen, Anders Rygh
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Diagrammatic perturbation methods in networks and sports ranking combinatorics
Park, Juyong
2010-04-01
Analytic and computational tools developed in statistical physics are being increasingly applied to the study of complex networks. Here we present recent developments in the diagrammatic perturbation methods for the exponential random graph models, and apply them to the combinatoric problem of determining the ranking of nodes in directed networks that represent pairwise competitions.
The diagnostic status of first-rank symptoms
DEFF Research Database (Denmark)
Nordgaard, Julie; Arnfred, Sidse Marie; Handest, P.
2008-01-01
In the International Statistical Classification of Diseases, Tenth Revision(ICD-10) and Diagnostic and Statistical Manual of Mental Disorder, Third and Fourth Edition(DSM-III-IV), the presence of one of Schneider "first-rank symptoms" (FRS) is symptomatically sufficient for the schizophrenia...... diagnosis. Yet, it has been claimed that FRS may also be found in the nonschizophrenic conditions, and therefore, they are not specific or diagnostic for schizophrenia. This review was made to clarify the issue of diagnostic specificity....
The value of multivariate model sophistication
DEFF Research Database (Denmark)
Rombouts, Jeroen; Stentoft, Lars; Violante, Francesco
2014-01-01
We assess the predictive accuracies of a large number of multivariate volatility models in terms of pricing options on the Dow Jones Industrial Average. We measure the value of model sophistication in terms of dollar losses by considering a set of 444 multivariate models that differ in their spec......We assess the predictive accuracies of a large number of multivariate volatility models in terms of pricing options on the Dow Jones Industrial Average. We measure the value of model sophistication in terms of dollar losses by considering a set of 444 multivariate models that differ...... in their specification of the conditional variance, conditional correlation, innovation distribution, and estimation approach. All of the models belong to the dynamic conditional correlation class, which is particularly suitable because it allows consistent estimations of the risk neutral dynamics with a manageable....... In addition to investigating the value of model sophistication in terms of dollar losses directly, we also use the model confidence set approach to statistically infer the set of models that delivers the best pricing performances....
Model Checking Multivariate State Rewards
DEFF Research Database (Denmark)
Nielsen, Bo Friis; Nielson, Flemming; Nielson, Hanne Riis
2010-01-01
We consider continuous stochastic logics with state rewards that are interpreted over continuous time Markov chains. We show how results from multivariate phase type distributions can be used to obtain higher-order moments for multivariate state rewards (including covariance). We also generalise ...
Validating rankings in soccer championships
Directory of Open Access Journals (Sweden)
Annibal Parracho Sant'Anna
2012-08-01
Full Text Available The final ranking of a championship is determined by quality attributes combined with other factors which should be filtered out of any decision on relegation or draft for upper level tournaments. Factors like referees' mistakes and difficulty of certain matches due to its accidental importance to the opponents should have their influence reduced. This work tests approaches to combine classification rules considering the imprecision of the number of points as a measure of quality and of the variables that provide reliable explanation for it. Two home-advantage variables are tested and shown to be apt to enter as explanatory variables. Independence between the criteria is checked against the hypothesis of maximal correlation. The importance of factors and of composition rules is evaluated on the basis of correlation between rank vectors, number of classes and number of clubs in tail classes. Data from five years of the Brazilian Soccer Championship are analyzed.
Minkowski metrics in creating universal ranking algorithms
Directory of Open Access Journals (Sweden)
Andrzej Ameljańczyk
2014-06-01
Full Text Available The paper presents a general procedure for creating the rankings of a set of objects, while the relation of preference based on any ranking function. The analysis was possible to use the ranking functions began by showing the fundamental drawbacks of commonly used functions in the form of a weighted sum. As a special case of the ranking procedure in the space of a relation, the procedure based on the notion of an ideal element and generalized Minkowski distance from the element was proposed. This procedure, presented as universal ranking algorithm, eliminates most of the disadvantages of ranking functions in the form of a weighted sum.[b]Keywords[/b]: ranking functions, preference relation, ranking clusters, categories, ideal point, universal ranking algorithm
Combined Reduced-Rank Transform
Directory of Open Access Journals (Sweden)
Anatoli Torokhti
2006-04-01
Full Text Available We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of {rank-reduced} transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed transform ${mathcal T}_p$ is presented in the form of a sum with $p$ terms where each term is interpreted as a particular rank-reduced transform. Moreover, terms in ${mathcal T}_p$ are represented as a combination of three operations ${mathcal F}_k$, ${mathcal Q}_k$ and ${oldsymbol{varphi}}_k$ with $k=1,ldots,p$. The prime idea is to determine ${mathcal F}_k$ separately, for each $k=1,ldots,p$, from an associated rank-constrained minimization problem similar to that used in the Karhunen--Lo`{e}ve transform. The operations ${mathcal Q}_k$ and ${oldsymbol{varphi}}_k$ are auxiliary for f/inding ${mathcal F}_k$. The contribution of each term in ${mathcal T}_p$ improves the entire transform performance. A corresponding unconstrained nonlinear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given.
Iacovacci, Jacopo; Rahmede, Christoph; Arenas, Alex; Bianconi, Ginestra
2016-10-01
Recently it has been recognized that many complex social, technological and biological networks have a multilayer nature and can be described by multiplex networks. Multiplex networks are formed by a set of nodes connected by links having different connotations forming the different layers of the multiplex. Characterizing the centrality of the nodes in a multiplex network is a challenging task since the centrality of the node naturally depends on the importance associated to links of a certain type. Here we propose to assign to each node of a multiplex network a centrality called Functional Multiplex PageRank that is a function of the weights given to every different pattern of connections (multilinks) existent in the multiplex network between any two nodes. Since multilinks distinguish all the possible ways in which the links in different layers can overlap, the Functional Multiplex PageRank can describe important non-linear effects when large relevance or small relevance is assigned to multilinks with overlap. Here we apply the Functional Page Rank to the multiplex airport networks, to the neuronal network of the nematode C. elegans, and to social collaboration and citation networks between scientists. This analysis reveals important differences existing between the most central nodes of these networks, and the correlations between their so-called pattern to success.
Ranking Support Vector Machine with Kernel Approximation.
Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Multivariate Process Control with Autocorrelated Data
DEFF Research Database (Denmark)
Kulahci, Murat
2011-01-01
often exhibit not only cross-‐correlation among the quality characteristics of interest but also serial dependence as a consequence of high sampling frequency and system dynamics. In practice, the most common method of monitoring multivariate data is through what is called the Hotelling’s T2 statistic......As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control and monitoring. This new high dimensional data....... For high dimensional data with excessive amount of cross correlation, practitioners are often recommended to use latent structures methods such as Principal Component Analysis to summarize the data in only a few linear combinations of the original variables that capture most of the variation in the data...
Multivariate Approaches to Classification in Extragalactic Astronomy
Directory of Open Access Journals (Sweden)
Didier eFraix-Burnet
2015-08-01
Full Text Available Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono- or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.
Multivariate Approaches to Classification in Extragalactic Astronomy
Fraix-Burnet, Didier; Thuillard, Marc; Chattopadhyay, Asis Kumar
2015-08-01
Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono- or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Sparse reduced-rank regression with covariance estimation
Chen, Lisha
2014-12-08
Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Correlation Test Application of Supplier’s Ranking Using TOPSIS and AHP-TOPSIS Method
Directory of Open Access Journals (Sweden)
Ika Yuniwati
2016-05-01
Full Text Available The supplier selection process can be done using multi-criteria decision making (MCDM methods in firms. There are many MCDM Methods, but firms must choose the method suitable with the firm condition. Company A has analyzed supplier’s ranking using TOPSIS method. TOPSIS method has a marjor weakness in its subjective weighting. This flaw is overcome using AHP method weighting having undergone a consistency test. In this study, the comparison of supplier’s ranking using TOPSIS and AHP-TOPSIS method used correlation test. The aim of this paper is to determine different result from two methods. Data in suppliers’ ranking is ordinal data, so this process used Spearman’s rank and Kendall’s tau b correlation. If most of the ranked scored are same, Kendall’s tau b correlation should be used. The other way, Spearman rank should be used. The result of this study is that most of the ranked scored are different, so the process used Spearman rank p-value of Spearman’s rank correlation of 0.505. It is greater than 0.05, means there is no statistically significant correlation between two methods. Furthermore, increment or decrement of supplier’s ranking in one method is not significantly related to the increment or decrement of supplier’s ranking in the second method
Comparison of a Class of Rank-Score Tests in Two-Factor Designs ...
African Journals Online (AJOL)
The empirical Type I error rate and power of these test statistics on the rank scores were determined using Monte Carlo simulation to investigate the robustness of the tests. The results show that there are problems of inflation in the Type I error rate using asymptotic ƒÓ2 test for all the rank score functions, especially for small ...
Soh, Kaycheng
2014-01-01
World university rankings (WUR) use the weight-and-sum approach to arrive at an overall measure which is then used to rank the participating universities of the world. Although the weight-and-sum procedure seems straightforward and accords with common sense, it has hidden methodological or statistical problems which render the meaning of the…
A method for generating permutation distribution of ranks in a k ...
African Journals Online (AJOL)
sample experiment is presented. This provides a methodology for constructing exact test of significance of a rank statistic. The proposed method is linked to the partition of integers and in a combinatorial sense the distribution of the ranks is ...
How to improve a team's position in the FIFA ranking? - A simulation study
Lasek, J.; Szlavik, Z.; Gagolewski, M.; Bhulai, S.
2016-01-01
In this paper, we study the efficacy of the official ranking for international football teams compiled by FIFA, the body governing football competition around the globe. We present strategies for improving a team's position in the ranking. By combining several statistical techniques, we derive an
Multivariate covariance generalized linear models
DEFF Research Database (Denmark)
Bonat, W. H.; Jørgensen, Bent
2016-01-01
We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link...... function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated...... measures and longitudinal structures, and the third involves a spatiotemporal analysis of rainfall data. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models...
The Privilege of Ranking: Google Plays Ball.
Wiggins, Richard
2003-01-01
Discussion of ranking systems used in various settings, including college football and academic admissions, focuses on the Google search engine. Explains the PageRank mathematical formula that scores Web pages by connecting the number of links; limitations, including authenticity and accuracy of ranked Web pages; relevancy; adjusting algorithms;…
Methodology, Meaning and Usefulness of Rankings
Williams, Ross
2008-01-01
University rankings are having a profound effect on both higher education systems and individual universities. In this paper we outline these effects, discuss the desirable characteristics of a good ranking methodology and document existing practice, with an emphasis on the two main international rankings (Shanghai Jiao Tong and THES-QS). We take…
Multivariate Methods for Muon Identification at LHCb
Assis-Jesus, A C S; Polycarpo, E; Landim, F
2001-01-01
The best possible identification of a muon by LHCb will be obtained by combining the available information from all the relevant subdetectors. We present a comparison among three multivariate methods, applying them to the muon identification. A neural network method and two parametric statistical approaches (one Bayesian and one classical) were studied in the context of separating muons from other particles using a simulation of eventswith the maximum background hit rate in the muon chambers. For a muon efficiency of 90% the pion misidentification is ~1%. The Bayesian and the neural network methods gave the best performance.
Tool for Ranking Research Options
Ortiz, James N.; Scott, Kelly; Smith, Harold
2005-01-01
Tool for Research Enhancement Decision Support (TREDS) is a computer program developed to assist managers in ranking options for research aboard the International Space Station (ISS). It could likely also be adapted to perform similar decision-support functions in industrial and academic settings. TREDS provides a ranking of the options, based on a quantifiable assessment of all the relevant programmatic decision factors of benefit, cost, and risk. The computation of the benefit for each option is based on a figure of merit (FOM) for ISS research capacity that incorporates both quantitative and qualitative inputs. Qualitative inputs are gathered and partly quantified by use of the time-tested analytical hierarchical process and used to set weighting factors in the FOM corresponding to priorities determined by the cognizant decision maker(s). Then by use of algorithms developed specifically for this application, TREDS adjusts the projected benefit for each option on the basis of levels of technical implementation, cost, and schedule risk. Based partly on Excel spreadsheets, TREDS provides screens for entering cost, benefit, and risk information. Drop-down boxes are provided for entry of qualitative information. TREDS produces graphical output in multiple formats that can be tailored by users.
Issue Management Risk Ranking Systems
Energy Technology Data Exchange (ETDEWEB)
Novack, Steven David; Marshall, Frances Mc Clellan; Stromberg, Howard Merion; Grant, Gary Michael
1999-06-01
Thousands of safety issues have been collected on-line at the Idaho National Engineering and Environmental Laboratory (INEEL) as part of the Issue Management Plan. However, there has been no established approach to prioritize collected and future issues. The authors developed a methodology, based on hazards assessment, to identify and risk rank over 5000 safety issues collected at INEEL. This approach required that it was easily applied and understandable for site adaptation and commensurate with the Integrated Safety Plan. High-risk issues were investigated and mitigative/preventive measures were suggested and ranked based on a cost-benefit scheme to provide risk-informed safety measures. This methodology was consistent with other integrated safety management goals and tasks providing a site-wide risk informed decision tool to reduce hazardous conditions and focus resources on high-risk safety issues. As part of the issue management plan, this methodology was incorporated at the issue collection level and training was provided to management to better familiarize decision-makers with concepts of safety and risk. This prioritization methodology and issue dissemination procedure will be discussed. Results of issue prioritization and training efforts will be summarized. Difficulties and advantages of the process will be reported. Development and incorporation of this process into INEELs lessons learned reporting and the site-wide integrated safety management program will be shown with an emphasis on establishing self reliance and ownership of safety issues.
Fractional and multivariable calculus model building and optimization problems
Mathai, A M
2017-01-01
This textbook presents a rigorous approach to multivariable calculus in the context of model building and optimization problems. This comprehensive overview is based on lectures given at five SERC Schools from 2008 to 2012 and covers a broad range of topics that will enable readers to understand and create deterministic and nondeterministic models. Researchers, advanced undergraduate, and graduate students in mathematics, statistics, physics, engineering, and biological sciences will find this book to be a valuable resource for finding appropriate models to describe real-life situations. The first chapter begins with an introduction to fractional calculus moving on to discuss fractional integrals, fractional derivatives, fractional differential equations and their solutions. Multivariable calculus is covered in the second chapter and introduces the fundamentals of multivariable calculus (multivariable functions, limits and continuity, differentiability, directional derivatives and expansions of multivariable ...
Modeling Multivariate Distributions with Continuous Margins Using the copula R Package
Directory of Open Access Journals (Sweden)
Ivan Kojadinovic
2010-10-01
Full Text Available The copula-based modeling of multivariate distributions with continuous margins is presented as a succession of rank-based tests: a multivariate test of randomness followed by a test of mutual independence and a series of goodness-of-fit tests. All the tests under consideration are based on the empirical copula, which is a nonparametric rank-based estimator of the true unknown copula. The principles of the tests are recalled and their implementation in the copula R package is briefly described. Their use in the construction of a copula model from data is thoroughly illustrated on real insurance and financial data.
Multivariate Generalized Multiscale Entropy Analysis
Directory of Open Access Journals (Sweden)
Anne Humeau-Heurtier
2016-11-01
Full Text Available Multiscale entropy (MSE was introduced in the 2000s to quantify systems’ complexity. MSE relies on (i a coarse-graining procedure to derive a set of time series representing the system dynamics on different time scales; (ii the computation of the sample entropy for each coarse-grained time series. A refined composite MSE (rcMSE—based on the same steps as MSE—also exists. Compared to MSE, rcMSE increases the accuracy of entropy estimation and reduces the probability of inducing undefined entropy for short time series. The multivariate versions of MSE (MMSE and rcMSE (MrcMSE have also been introduced. In the coarse-graining step used in MSE, rcMSE, MMSE, and MrcMSE, the mean value is used to derive representations of the original data at different resolutions. A generalization of MSE was recently published, using the computation of different moments in the coarse-graining procedure. However, so far, this generalization only exists for univariate signals. We therefore herein propose an extension of this generalized MSE to multivariate data. The multivariate generalized algorithms of MMSE and MrcMSE presented herein (MGMSE and MGrcMSE, respectively are first analyzed through the processing of synthetic signals. We reveal that MGrcMSE shows better performance than MGMSE for short multivariate data. We then study the performance of MGrcMSE on two sets of short multivariate electroencephalograms (EEG available in the public domain. We report that MGrcMSE may show better performance than MrcMSE in distinguishing different types of multivariate EEG data. MGrcMSE could therefore supplement MMSE or MrcMSE in the processing of multivariate datasets.
The Effect of the Multivariate Box-Cox Transformation on the Power of MANOVA.
Kirisci, Levent; Hsu, Tse-Chi
Most of the multivariate statistical techniques rely on the assumption of multivariate normality. The effects of non-normality on multivariate tests are assumed to be negligible when variance-covariance matrices and sample sizes are equal. Therefore, in practice, investigators do not usually attempt to remove non-normality. In this simulation…
Two-dimensional ranking of Wikipedia articles
Zhirov, A. O.; Zhirov, O. V.; Shepelyansky, D. L.
2010-10-01
The Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists ab aeterno. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. While PageRank highlights very well known nodes with many ingoing links, CheiRank highlights very communicative nodes with many outgoing links. In this way the ranking becomes two-dimensional. Using CheiRank and PageRank we analyze the properties of two-dimensional ranking of all Wikipedia English articles and show that it gives their reliable classification with rich and nontrivial features. Detailed studies are done for countries, universities, personalities, physicists, chess players, Dow-Jones companies and other categories.
Balakrishnan, N; Nagaraja, HN
2007-01-01
S. Panchapakesan has made significant contributions to ranking and selection and has published in many other areas of statistics, including order statistics, reliability theory, stochastic inequalities, and inference. Written in his honor, the twenty invited articles in this volume reflect recent advances in these areas and form a tribute to Panchapakesan's influence and impact on these areas. Thematically organized, the chapters cover a broad range of topics from: Inference; Ranking and Selection; Multiple Comparisons and Tests; Agreement Assessment; Reliability; and Biostatistics. Featuring
Interpreting the Phase Spectrum in Fourier Analysis of Partial Ranking Data
Directory of Open Access Journals (Sweden)
Ramakrishna Kakarala
2012-01-01
Full Text Available Whenever ranking data are collected, such as in elections, surveys, and database searches, it is frequently the case that partial rankings are available instead of, or sometimes in addition to, full rankings. Statistical methods for partial rankings have been discussed in the literature. However, there has been relatively little published on their Fourier analysis, perhaps because the abstract nature of the transforms involved impede insight. This paper provides as its novel contributions an analysis of the Fourier transform for partial rankings, with particular attention to the first three ranks, while emphasizing on basic signal processing properties of transform magnitude and phase. It shows that the transform and its magnitude satisfy a projection invariance and analyzes the reconstruction of data from either magnitude or phase alone. The analysis is motivated by appealing to corresponding properties of the familiar DFT and by application to two real-world data sets.
Multi-instance dictionary learning via multivariate performance measure optimization
Wang, Jim Jing-Yan
2016-12-29
The multi-instance dictionary plays a critical role in multi-instance data representation. Meanwhile, different multi-instance learning applications are evaluated by specific multivariate performance measures. For example, multi-instance ranking reports the precision and recall. It is not difficult to see that to obtain different optimal performance measures, different dictionaries are needed. This observation motives us to learn performance-optimal dictionaries for this problem. In this paper, we propose a novel joint framework for learning the multi-instance dictionary and the classifier to optimize a given multivariate performance measure, such as the F1 score and precision at rank k. We propose to represent the bags as bag-level features via the bag-instance similarity, and learn a classifier in the bag-level feature space to optimize the given performance measure. We propose to minimize the upper bound of a multivariate loss corresponding to the performance measure, the complexity of the classifier, and the complexity of the dictionary, simultaneously, with regard to both the dictionary and the classifier parameters. In this way, the dictionary learning is regularized by the performance optimization, and a performance-optimal dictionary is obtained. We develop an iterative algorithm to solve this minimization problem efficiently using a cutting-plane algorithm and a coordinate descent method. Experiments on multi-instance benchmark data sets show its advantage over both traditional multi-instance learning and performance optimization methods.
An Introduction to Multivariable Mathematics
Simon, Leon
2008-01-01
The text is designed for use in a forty-lecture introductory course covering linear algebra, multivariable differential calculus, and an introduction to real analysis. The core material of the book is arranged to allow for the main introductory material on linear algebra, including basic vector space theory in Euclidean space and the initial theory of matrices and linear systems, to be covered in the first ten or eleven lectures, followed by a similar number of lectures on basic multivariable analysis, including first theorems on differentiable functions on domains in Euclidean space and a bri
Intuitive introductory statistics
Wolfe, Douglas A
2017-01-01
This textbook is designed to give an engaging introduction to statistics and the art of data analysis. The unique scope includes, but also goes beyond, classical methodology associated with the normal distribution. What if the normal model is not valid for a particular data set? This cutting-edge approach provides the alternatives. It is an introduction to the world and possibilities of statistics that uses exercises, computer analyses, and simulations throughout the core lessons. These elementary statistical methods are intuitive. Counting and ranking features prominently in the text. Nonparametric methods, for instance, are often based on counts and ranks and are very easy to integrate into an introductory course. The ease of computation with advanced calculators and statistical software, both of which factor into this text, allows important techniques to be introduced earlier in the study of statistics. This book's novel scope also includes measuring symmetry with Walsh averages, finding a nonp...
[Regional life expectancy rankings : Methodological artefacts in population updates].
Poppe, Franziska; Annuß, Rolf; Kuhn, Joseph
2017-12-01
For the calculation of life expectancy on a regional level, data from the mortality statistics and population numbers are needed. The latter are derived from population censuses, which have to be undertaken every 10 years according to the EU regulation No. 763/2008. In Germany, the last census took place in 2011 (Census 2011). The current population numbers are calculated on the basis of the most recent population census (population update). Births, deaths, immigration and migration, in addition to other data, are taken into account in this calculation. However, with passing time since the last census, inaccuracies in population updates may increase, which can affect the value of life expectancy calculations.Based on the comparison of life expectancy rankings, the impact and extent of changing over from the 1987 to the more recent 2011 census for regional comparisons were examined in two parts of Germany, Bavaria and North Rhine-Westphalia. As expected, the results show that larger changes in the calculated life expectancy result from larger changes in population statistics. However, noteworthy changes in life expectancy rankings do not necessarily follow larger changes in the population numbers. Regional life expectancy rankings are potentially always influenced by inaccuracies in the underlying population statistics. This should be taken into account when interpreting such small-scale differences.
The Multivariate Gaussian Probability Distribution
DEFF Research Database (Denmark)
Ahrendt, Peter
2005-01-01
This technical report intends to gather information about the multivariate gaussian distribution, that was previously not (at least to my knowledge) to be found in one place and written as a reference manual. Additionally, some useful tips and tricks are collected that may be useful in practical ...
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole Eiler; Stelzer, Robert
2011-01-01
Univariate superpositions of Ornstein–Uhlenbeck-type processes (OU), called supOU processes, provide a class of continuous time processes capable of exhibiting long memory behavior. This paper introduces multivariate supOU processes and gives conditions for their existence and finiteness of momen...
Rank Modulation for Translocation Error Correction
Farnoud, Farzad; Milenkovic, Olgica
2012-01-01
We consider rank modulation codes for flash memories that allow for handling arbitrary charge drop errors. Unlike classical rank modulation codes used for correcting errors that manifest themselves as swaps of two adjacently ranked elements, the proposed \\emph{translocation rank codes} account for more general forms of errors that arise in storage systems. Translocations represent a natural extension of the notion of adjacent transpositions and as such may be analyzed using related concepts in combinatorics and rank modulation coding. Our results include tight bounds on the capacity of translocation rank codes, construction techniques for asymptotically good codes, as well as simple decoding methods for one class of structured codes. As part of our exposition, we also highlight the close connections between the new code family and permutations with short common subsequences, deletion and insertion error-correcting codes for permutations and permutation arrays.
Dynamics of Ranking Processes in Complex Systems
Blumm, Nicholas; Ghoshal, Gourab; Forró, Zalán; Schich, Maximilian; Bianconi, Ginestra; Bouchaud, Jean-Philippe; Barabási, Albert-László
2012-09-01
The world is addicted to ranking: everything, from the reputation of scientists, journals, and universities to purchasing decisions is driven by measured or perceived differences between them. Here, we analyze empirical data capturing real time ranking in a number of systems, helping to identify the universal characteristics of ranking dynamics. We develop a continuum theory that not only predicts the stability of the ranking process, but shows that a noise-induced phase transition is at the heart of the observed differences in ranking regimes. The key parameters of the continuum theory can be explicitly measured from data, allowing us to predict and experimentally document the existence of three phases that govern ranking stability.
Error analysis of stochastic gradient descent ranking.
Chen, Hong; Tang, Yi; Li, Luoqing; Yuan, Yuan; Li, Xuelong; Tang, Yuanyan
2013-06-01
Ranking is always an important task in machine learning and information retrieval, e.g., collaborative filtering, recommender systems, drug discovery, etc. A kernel-based stochastic gradient descent algorithm with the least squares loss is proposed for ranking in this paper. The implementation of this algorithm is simple, and an expression of the solution is derived via a sampling operator and an integral operator. An explicit convergence rate for leaning a ranking function is given in terms of the suitable choices of the step size and the regularization parameter. The analysis technique used here is capacity independent and is novel in error analysis of ranking learning. Experimental results on real-world data have shown the effectiveness of the proposed algorithm in ranking tasks, which verifies the theoretical analysis in ranking error.
Analysis of Preference Data Using Intermediate Test Statistic ...
African Journals Online (AJOL)
Intermediate statistic is a link between Friedman test statistic and the multinomial statistic. The statistic is based on ranking in a selected number of treatments, not necessarily all alternatives. We show that this statistic is transitive to well-known test statistic being used for analysis of preference data. Specifically, it is shown ...
Ranking in Swiss system chess team tournaments
Csató, László
2015-01-01
The paper uses paired comparison-based scoring procedures for ranking the participants of a Swiss system chess team tournament. We present the main challenges of ranking in Swiss system, the features of individual and team competitions as well as the failures of official lexicographical orders. The tournament is represented as a ranking problem, our model is discussed with respect to the properties of the score, generalized row sum and least squares methods. The proposed procedure is illustra...
Ausloos, Marcel
2016-01-01
A mere hyperbolic law, like the Zipf's law power function, is often inadequate to describe rank-size relationships. An alternative theoretical distribution is proposed based on theoretical physics arguments starting from the Yule-Simon distribution. A modeling is proposed leading to a universal form. A theoretical suggestion for the "best (or optimal) distribution", is provided through an entropy argument. The ranking of areas through the number of cities in various countries and some sport competition ranking serves for the present illustrations.
Methodology for ranking restoration options
Energy Technology Data Exchange (ETDEWEB)
Hedemann Jensen, Per
1999-04-01
The work described in this report has been performed as a part of the RESTRAT Project FI4P-CT95-0021a (PL 950128) co-funded by the Nuclear Fission Safety Programme of the European Commission. The RESTRAT project has the overall objective of developing generic methodologies for ranking restoration techniques as a function of contamination and site characteristics. The project includes analyses of existing remediation methodologies and contaminated sites, and is structured in the following steps: characterisation of relevant contaminated sites; identification and characterisation of relevant restoration techniques; assessment of the radiological impact; development and application of a selection methodology for restoration options; formulation of generic conclusions and development of a manual. The project is intended to apply to situations in which sites with nuclear installations have been contaminated with radioactive materials as a result of the operation of these installations. The areas considered for remedial measures include contaminated land areas, rivers and sediments in rivers, lakes, and sea areas. Five contaminated European sites have been studied. Various remedial measures have been envisaged with respect to the optimisation of the protection of the populations being exposed to the radionuclides at the sites. Cost-benefit analysis and multi-attribute utility analysis have been applied for optimisation. Health, economic and social attributes have been included and weighting factors for the different attributes have been determined by the use of scaling constants. (au)
Ranking documents with a thesaurus.
Rada, R; Bicknell, E
1989-09-01
This article reports on exploratory experiments in evaluating and improving a thesaurus through studying its effect on retrieval. A formula called DISTANCE was developed to measure the conceptual distance between queries and documents encoded as sets of thesaurus terms. DISTANCE references MeSH (Medical Subject Headings) and assesses the degree of match between a MeSH-encoded query and document. The performance of DISTANCE on MeSH is compared to the performance of people in the assessment of conceptual distance between queries and documents, and is found to simulate with surprising accuracy the human performance. The power of the computer simulation stems both from the tendency of people to rely heavily on broader-than (BT) relations in making decisions about conceptual distance and from the thousands of accurate BT relations in MeSH. One source for discrepancy between the algorithms' measurement of closeness between query and document and people's measurement of closeness between query and document is occasional inconsistency in the BT relations. Our experiments with adding non-BT relations to MeSH showed how these non-BT non-BT relations to MeSH showed how these non-BT relations could improve document ranking, if DISTANCE were also appropriately revised to treat these relations differently from BT relations.
Communities in Large Networks: Identification and Ranking
DEFF Research Database (Denmark)
Olsen, Martin
2008-01-01
We study the problem of identifying and ranking the members of a community in a very large network with link analysis only, given a set of representatives of the community. We define the concept of a community justified by a formal analysis of a simple model of the evolution of a directed graph. ...... and its immediate surroundings. The members are ranked with a “local” variant of the PageRank algorithm. Results are reported from successful experiments on identifying and ranking Danish Computer Science sites and Danish Chess pages using only a few representatives....
Citation graph based ranking in Invenio
Marian, Ludmila; Rajman, Martin; Vesely, Martin
2010-01-01
Invenio is the web-based integrated digital library system developed at CERN. Within this framework, we present four types of ranking models based on the citation graph that complement the simple approach based on citation counts: time-dependent citation counts, a relevancy ranking which extends the PageRank model, a time-dependent ranking which combines the freshness of citations with PageRank and a ranking that takes into consideration the external citations. We present our analysis and results obtained on two main data sets: Inspire and CERN Document Server. Our main contributions are: (i) a study of the currently available ranking methods based on the citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods such as treating all citations of equal importance, not taking time into account or considering the citation graph complete; (iii) a detailed study of the key parameters for these ranking methods. (The original publication is ava...
Robust ranks of true associations in genome-wide case-control association studies.
Zheng, Gang; Joo, Jungnam; Lin, Jing-Ping; Stylianou, Mario; Waclawiw, Myron A; Geller, Nancy L
2007-01-01
In whole-genome association studies, at the first stage, all markers are tested for association and their test statistics or p-values are ranked. At the second stage, some most significant markers are further analyzed by more powerful statistical methods. This helps reduce the number of hypotheses to be corrected for in multiple testing. Ranks of true associations in genome-wide scans using a single test statistic have been studied. In a case-control design for association, the trend test has been proposed. However, three different trend tests, optimal for the recessive, additive, and dominant models, respectively, are available for each marker. Because the true genetic model is unknown, we rank markers based on multiple test statistics or test statistics robust to model mis-specification. We studied this problem with application to Problem 3 of Genetic Analysis Workshop 15. An independent simulation study was also conducted to further evaluate the proposed procedure.
DEFF Research Database (Denmark)
Bladt, Mogens; Nielsen, Bo Friis
2012-01-01
of these distributions further belongs to an important subclass of MVME distributions [5, 1] where the multivariate random vector can be interpreted as a number of simultaneously collected rewards during sojourns in a the states of a Markov chain with one absorbing state, the rest of the states being transient. We...... present the corresponding representations for all such distributions. In this way we obtain a unification of the variety of existing distributions as well as a deeper understanding of their probabilistic nature and a clarification of their similarities and differences. In particular one may easily...... Laplace transform. In a longer perspective stochastic and statistical analysis for MVME will in particular apply to any of the previously defined distributions. Multivariate gamma distributions have been used in a variety of fields like hydrology, [11], [10], [6], space (wind modeling) [9] reliability [3...
Directory of Open Access Journals (Sweden)
G.T. Aydinov
2017-03-01
Full Text Available The article gives the results of multivariate analysis of structure and contribution per shares made by potential risk factors at malignant neoplasms in trachea, bronchial tubes and lung. The authors used specialized databases comprising personified records on oncologic diseases in Taganrog, Rostov region, over 1986-2015 (30,684 registered cases of malignant neoplasms, including 3,480 cases of trachea cancer, bronchial tubes cancer, and lung cancer. When carrying out analytical research we applied both multivariate statistical techniques (factor analysis and hierarchical cluster correlation analysis and conventional techniques of epidemiologic analysis including etiologic fraction calculation (EF, as well as an original technique of assessing actual (epidemiologic risk. Average long-term morbidity with trachea, bronchial tubes and lung cancer over 2011-2015 amounts to 46.64 o / oooo . Over the last 15 years a stable decreasing trend has formed, annual average growth being – 1.22 %. This localization holds the 3rd rank place in oncologic morbidity structure, its specific weight being 10.02 %. We determined etiological fraction (EF for smoking as a priority risk factor causing trachea, bronchial tubes and lung cancer; this fraction amounts to 76.19 % for people aged 40 and older, and to 81.99 % for those aged 60 and older. Application of multivariate statistical techniques (factor analysis and cluster correlation analysis in this research enabled us to make factor structure more simple; namely, to highlight, interpret, give a quantitative estimate of self-descriptiveness and rank four group (latent potential risk factors causing lung cancer.
Introduction to Bayesian statistics
Bolstad, William M
2017-01-01
There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies
Directory of Open Access Journals (Sweden)
Qiong Yang
2012-01-01
Full Text Available Multivariate phenotypes are frequently encountered in genetic association studies. The purpose of analyzing multivariate phenotypes usually includes discovery of novel genetic variants of pleiotropy effects, that is, affecting multiple phenotypes, and the ultimate goal of uncovering the underlying genetic mechanism. In recent years, there have been new method development and application of existing statistical methods to such phenotypes. In this paper, we provide a review of the available methods for analyzing association between a single marker and a multivariate phenotype consisting of the same type of components (e.g., all continuous or all categorical or different types of components (e.g., some are continuous and others are categorical. We also reviewed causal inference methods designed to test whether the detected association with the multivariate phenotype is truly pleiotropy or the genetic marker exerts its effects on some phenotypes through affecting the others.
Multivariate phase type distributions - Applications and parameter estimation
DEFF Research Database (Denmark)
Meisch, David
, allowing for different estimation methods for the whole class or subclasses of phase type distributions. These attributes make this class of distributions an interesting alternative to the normal distribution. When facing multivariate problems, the only general distribution that allows for estimation...... and statistical inference, is the multivariate normal distribution. Unfortunately only little is known about the general class of multivariate phase type distribution. Considering the results concerning parameter estimation and inference theory of univariate phase type distributions, the class of multivariate...... and reducing model uncertainties. Research has shown that the errors on cost estimates for infrastructure projects clearly do not follow a normal distribution but is skewed towards cost overruns. This skewness can be described using phase type distributions. Cost benefit analysis assesses potential future...
Matrix-based introduction to multivariate data analysis
Adachi, Kohei
2016-01-01
This book enables readers who may not be familiar with matrices to understand a variety of multivariate analysis procedures in matrix forms. Another feature of the book is that it emphasizes what model underlies a procedure and what objective function is optimized for fitting the model to data. The author believes that the matrix-based learning of such models and objective functions is the fastest way to comprehend multivariate data analysis. The text is arranged so that readers can intuitively capture the purposes for which multivariate analysis procedures are utilized: plain explanations of the purposes with numerical examples precede mathematical descriptions in almost every chapter. This volume is appropriate for undergraduate students who already have studied introductory statistics. Graduate students and researchers who are not familiar with matrix-intensive formulations of multivariate data analysis will also find the book useful, as it is based on modern matrix formulations with a special emphasis on ...
Models and Inference for Multivariate Spatial Extremes
Vettori, Sabrina
2017-12-07
The development of flexible and interpretable statistical methods is necessary in order to provide appropriate risk assessment measures for extreme events and natural disasters. In this thesis, we address this challenge by contributing to the developing research field of Extreme-Value Theory. We initially study the performance of existing parametric and non-parametric estimators of extremal dependence for multivariate maxima. As the dimensionality increases, non-parametric estimators are more flexible than parametric methods but present some loss in efficiency that we quantify under various scenarios. We introduce a statistical tool which imposes the required shape constraints on non-parametric estimators in high dimensions, significantly improving their performance. Furthermore, by embedding the tree-based max-stable nested logistic distribution in the Bayesian framework, we develop a statistical algorithm that identifies the most likely tree structures representing the data\\'s extremal dependence using the reversible jump Monte Carlo Markov Chain method. A mixture of these trees is then used for uncertainty assessment in prediction through Bayesian model averaging. The computational complexity of full likelihood inference is significantly decreased by deriving a recursive formula for the nested logistic model likelihood. The algorithm performance is verified through simulation experiments which also compare different likelihood procedures. Finally, we extend the nested logistic representation to the spatial framework in order to jointly model multivariate variables collected across a spatial region. This situation emerges often in environmental applications but is not often considered in the current literature. Simulation experiments show that the new class of multivariate max-stable processes is able to detect both the cross and inner spatial dependence of a number of extreme variables at a relatively low computational cost, thanks to its Bayesian hierarchical
Multivariable PID control by decoupling
Garrido, Juan; Vázquez, Francisco; Morilla, Fernando
2016-04-01
This paper presents a new methodology to design multivariable proportional-integral-derivative (PID) controllers based on decoupling control. The method is presented for general n × n processes. In the design procedure, an ideal decoupling control with integral action is designed to minimise interactions. It depends on the desired open-loop processes that are specified according to realisability conditions and desired closed-loop performance specifications. These realisability conditions are stated and three common cases to define the open-loop processes are studied and proposed. Then, controller elements are approximated to PID structure. From a practical point of view, the wind-up problem is also considered and a new anti-wind-up scheme for multivariable PID controller is proposed. Comparisons with other works demonstrate the effectiveness of the methodology through the use of several simulation examples and an experimental lab process.
Refining Multivariate Value Set Bounds
Smith, Luke Alexander
Over finite fields, if the image of a polynomial map is not the entire field, then its cardinality can be bounded above by a significantly smaller value. Earlier results bound the cardinality of the value set using the degree of the polynomial, but more recent results make use of the powers of all monomials. In this paper, we explore the geometric properties of the Newton polytope and show how they allow for tighter upper bounds on the cardinality of the multivariate value set. We then explore a method which allows for even stronger upper bounds, regardless of whether one uses the multivariate degree or the Newton polytope to bound the value set. Effectively, this provides an alternate proof of Kosters' degree bound, an improved Newton polytope-based bound, and an improvement of a degree matrix-based result given by Zan and Cao.
A frequency-based technique to improve the spelling suggestion rank in medical queries.
Crowell, Jonathan; Zeng, Qing; Ngo, Long; Lacroix, Eve-Marie
2004-01-01
There is an abundance of health-related information online, and millions of consumers search for such information. Spell checking is of crucial importance in returning pertinent results, so the authors propose a technique for increasing the effectiveness of spell-checking tools used for health-related information retrieval. A sample of incorrectly spelled medical terms was submitted to two different spell-checking tools, and the resulting suggestions, derived under two different dictionary configurations, were re-sorted according to how frequently each term appeared in log data from a medical search engine. Univariable analysis was carried out to assess the effect of each factor (spell-checking tool, dictionary type, re-sort, or no re-sort) on the probability of success. The factors that were statistically significant in the univariable analysis were then used in multivariable analysis to evaluate the independent effect of each of the factors. The re-sorted suggestions proved to be significantly more accurate than the original list returned by the spell-checking tool. The odds of finding the correct suggestion in the number one rank were increased by 63% after re-sorting using the authors' method. This effect was independent of both the dictionary and the spell-checking tools that were used. Using knowledge about the frequency of a given word's occurrence in the medical domain can significantly improve spelling correction for medical queries.
Multivariate Matrix-Exponential Distributions
DEFF Research Database (Denmark)
Bladt, Mogens; Nielsen, Bo Friis
2010-01-01
In this article we consider the distributions of non-negative random vectors with a joint rational Laplace transform, i.e., a fraction between two multi-dimensional polynomials. These distributions are in the univariate case known as matrix-exponential distributions, since their densities can be ...... for the multivariate normal distribution. However, the proof is different and involves theory for rational function based on continued fractions and Hankel determinants....
APPLICATION OF MULTIVARIATE CONTROL CHARTS FOR MONITORING AN INDUSTRIAL PROCESS
Directory of Open Access Journals (Sweden)
Custodio da Cunha Alves
2013-12-01
Full Text Available The effective simultaneous monitoring of the many quality characteristics of a production process often depends on statistical tools that are becoming more and more specific. The goal of this paper is to investigate, via an industrial application, if there are significant differences in sensitivity between the use of Multivariate Cumulative Sum (MCUSUM, Multivariate Exponentially Weighted Average (MEWMA control charts and Hotelling T2 charts in the detection of small changes in the vector of means of the process. In doing this study, we used real data from a machining process. A MCUSUM control chart was applied to monitor the two quality characteristics of this process simultaneously. A MEWMA chart was also applied. The result was compared to the application of the Hotelling T2 chart, which showed that the MCUSUM and MEWMA control charts detected the change sooner. This study was fundamental in defining the best choice between the three charts for the multivariate statistical analysis of this industrial process.
Vasconcelos, A G; Almeida, R M; Nobre, F F
2001-08-01
This paper introduces an approach that includes non-quantitative factors for the selection and assessment of multivariate complex models in health. A goodness-of-fit based methodology combined with fuzzy multi-criteria decision-making approach is proposed for model selection. Models were obtained using the Path Analysis (PA) methodology in order to explain the interrelationship between health determinants and the post-neonatal component of infant mortality in 59 municipalities of Brazil in the year 1991. Socioeconomic and demographic factors were used as exogenous variables, and environmental, health service and agglomeration as endogenous variables. Five PA models were developed and accepted by statistical criteria of goodness-of fit. These models were then submitted to a group of experts, seeking to characterize their preferences, according to predefined criteria that tried to evaluate model relevance and plausibility. Fuzzy set techniques were used to rank the alternative models according to the number of times a model was superior to ("dominated") the others. The best-ranked model explained above 90% of the endogenous variables variation, and showed the favorable influences of income and education levels on post-neonatal mortality. It also showed the unfavorable effect on mortality of fast population growth, through precarious dwelling conditions and decreased access to sanitation. It was possible to aggregate expert opinions in model evaluation. The proposed procedure for model selection allowed the inclusion of subjective information in a clear and systematic manner.
... What Is Cancer? Cancer Statistics Cancer Disparities Cancer Statistics Cancer has a major impact on society in ... success of efforts to control and manage cancer. Statistics at a Glance: The Burden of Cancer in ...
... Coping with Alzheimer’s COPD Caregiving Take Care! Caregiver Statistics Statistics on Family Caregivers and Family Caregiving Caregiving Population ... Health Care Caregiver Self-Awareness State by State Statistics Caregiving Population The value of the services family ...
Ranked Conservation Opportunity Areas for Region 7 (ECO_RES.RANKED_OAS)
U.S. Environmental Protection Agency — The RANKED_OAS are all the Conservation Opportunity Areas identified by MoRAP that have subsequently been ranked by patch size, landform representation, and the...
Ranking scientific publications: the effect of nonlinearity.
Yao, Liyang; Wei, Tian; Zeng, An; Fan, Ying; Di, Zengru
2014-10-17
Ranking the significance of scientific publications is a long-standing challenge. The network-based analysis is a natural and common approach for evaluating the scientific credit of papers. Although the number of citations has been widely used as a metric to rank papers, recently some iterative processes such as the well-known PageRank algorithm have been applied to the citation networks to address this problem. In this paper, we introduce nonlinearity to the PageRank algorithm when aggregating resources from different nodes to further enhance the effect of important papers. The validation of our method is performed on the data of American Physical Society (APS) journals. The results indicate that the nonlinearity improves the performance of the PageRank algorithm in terms of ranking effectiveness, as well as robustness against malicious manipulations. Although the nonlinearity analysis is based on the PageRank algorithm, it can be easily extended to other iterative ranking algorithms and similar improvements are expected.
Ranking scientific publications: the effect of nonlinearity
Yao, Liyang; Wei, Tian; Zeng, An; Fan, Ying; di, Zengru
2014-10-01
Ranking the significance of scientific publications is a long-standing challenge. The network-based analysis is a natural and common approach for evaluating the scientific credit of papers. Although the number of citations has been widely used as a metric to rank papers, recently some iterative processes such as the well-known PageRank algorithm have been applied to the citation networks to address this problem. In this paper, we introduce nonlinearity to the PageRank algorithm when aggregating resources from different nodes to further enhance the effect of important papers. The validation of our method is performed on the data of American Physical Society (APS) journals. The results indicate that the nonlinearity improves the performance of the PageRank algorithm in terms of ranking effectiveness, as well as robustness against malicious manipulations. Although the nonlinearity analysis is based on the PageRank algorithm, it can be easily extended to other iterative ranking algorithms and similar improvements are expected.
Entity Ranking using Wikipedia as a Pivot
R. Kaptein; P. Serdyukov; A.P. de Vries (Arjen); J. Kamps
2010-01-01
htmlabstractIn this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about
Entity ranking using Wikipedia as a pivot
Kaptein, R.; Serdyukov, P.; de Vries, A.; Kamps, J.; Huang, X.J.; Jones, G.; Koudas, N.; Wu, X.; Collins-Thompson, K.
2010-01-01
In this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities. Since
Biplots in Reduced-Rank Regression
Braak, ter C.J.F.; Looman, C.W.N.
1994-01-01
Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal
Mining Feedback in Ranking and Recommendation Systems
Zhuang, Ziming
2009-01-01
The amount of online information has grown exponentially over the past few decades, and users become more and more dependent on ranking and recommendation systems to address their information seeking needs. The advance in information technologies has enabled users to provide feedback on the utilities of the underlying ranking and recommendation…
Using centrality to rank web snippets
Jijkoun, V.; de Rijke, M.; Peters, C.; Jijkoun, V.; Mandl, T.; Müller, H.; Oard, D.W.; Peñas, A.; Petras, V.; Santos, D.
2008-01-01
We describe our participation in the WebCLEF 2007 task, targeted at snippet retrieval from web data. Our system ranks snippets based on a simple similarity-based centrality, inspired by the web page ranking algorithms. We experimented with retrieval units (sentences and paragraphs) and with the
Generating and ranking of Dyck words
Kasa, Zoltan
2010-01-01
A new algorithm to generate all Dyck words is presented, which is used in ranking and unranking Dyck words. We emphasize the importance of using Dyck words in encoding objects related to Catalan numbers. As a consequence of formulas used in the ranking algorithm we can obtain a recursive formula for the nth Catalan number.