Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning
Izenman, Alan Julian
2006-01-01
Describes the advances in computation and data storage that led to the introduction of many statistical tools for high-dimensional data analysis. Focusing on multivariate analysis, this book discusses nonlinear methods as well as linear methods. It presents an integrated mixture of classical and modern multivariate statistical techniques.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure.
Stalked protozoa identification by image analysis and multivariable statistical techniques.
Amaral, A L; Ginoris, Y P; Nicolau, A; Coelho, M A Z; Ferreira, E C
2008-06-01
Protozoa are considered good indicators of the treatment quality in activated sludge systems as they are sensitive to physical, chemical and operational processes. Therefore, it is possible to correlate the predominance of certain species or groups and several operational parameters of the plant. This work presents a semiautomatic image analysis procedure for the recognition of the stalked protozoa species most frequently found in wastewater treatment plants by determining the geometrical, morphological and signature data and subsequent processing by discriminant analysis and neural network techniques. Geometrical descriptors were found to be responsible for the best identification ability and the identification of the crucial Opercularia and Vorticella microstoma microorganisms provided some degree of confidence to establish their presence in wastewater treatment plants.
Application of Multivariable Statistical Techniques in Plant-wide WWTP Control Strategies Analysis
DEFF Research Database (Denmark)
Flores Alsina, Xavier; Comas, J.; Rodríguez-Roda, I.
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant...
Alvarez, Odalys Quevedo; Tagle, Margarita Edelia Villanueva; Pascual, Jorge L Gómez; Marín, Ma Teresa Larrea; Clemente, Ana Catalina Nuñez; Medina, Miriam Odette Cora; Palau, Raiza Rey; Alfonso, Mario Simeón Pomares
2014-10-01
Spatial and temporal variations of sediment quality in Matanzas Bay (Cuba) were studied by determining a total of 12 variables (Zn, Cu, Pb, As, Ni, Co, Al, Fe, Mn, V, CO₃²⁻, and total hydrocarbons (THC). Surface sediments were collected, annually, at eight stations during 2005-2008. Multivariate statistical techniques, such as principal component (PCA), cluster (CA), and lineal discriminant (LDA) analyses were applied for identification of the most significant variables influencing the environmental quality of sediments. Heavy metals (Zn, Cu, Pb, V, and As) and THC were the most significant species contributing to sediment quality variations during the sampling period. Concentrations of V and As were determined in sediments of this ecosystem for the first time. The variation of sediment environmental quality with the sampling period and the differentiation of samples in three groups along the bay were obtained. The usefulness of the multivariate statistical techniques employed for the environmental interpretation of a limited dataset was confirmed.
Ajorlo, Majid; Abdullah, Ramdzani B; Yusoff, Mohd Kamil; Halim, Ridzwan Abd; Hanif, Ahmad Husni Mohd; Willms, Walter D; Ebrahimian, Mahboubeh
2013-10-01
This study investigates the applicability of multivariate statistical techniques including cluster analysis (CA), discriminant analysis (DA), and factor analysis (FA) for the assessment of seasonal variations in the surface water quality of tropical pastures. The study was carried out in the TPU catchment, Kuala Lumpur, Malaysia. The dataset consisted of 1-year monitoring of 14 parameters at six sampling sites. The CA yielded two groups of similarity between the sampling sites, i.e., less polluted (LP) and moderately polluted (MP) at temporal scale. Fecal coliform (FC), NO3, DO, and pH were significantly related to the stream grouping in the dry season, whereas NH3, BOD, Escherichia coli, and FC were significantly related to the stream grouping in the rainy season. The best predictors for distinguishing clusters in temporal scale were FC, NH3, and E. coli, respectively. FC, E. coli, and BOD with strong positive loadings were introduced as the first varifactors in the dry season which indicates the biological source of variability. EC with a strong positive loading and DO with a strong negative loading were introduced as the first varifactors in the rainy season, which represents the physiochemical source of variability. Multivariate statistical techniques were effective analytical techniques for classification and processing of large datasets of water quality and the identification of major sources of water pollution in tropical pastures.
A primer of multivariate statistics
Harris, Richard J
2014-01-01
Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why
Multivariate Statistical Process Control
DEFF Research Database (Denmark)
Kulahci, Murat
2013-01-01
As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control (SPC) and monitoring for which the aim...... is to identify “out-of-control” state of a process using control charts in order to reduce the excessive variation caused by so-called assignable causes. In practice, the most common method of monitoring multivariate data is through a statistic akin to the Hotelling’s T2. For high dimensional data with excessive...
Applied multivariate statistical analysis
Härdle, Wolfgang Karl
2015-01-01
Focusing on high-dimensional applications, this 4th edition presents the tools and concepts used in multivariate data analysis in a style that is also accessible for non-mathematicians and practitioners. It surveys the basic principles and emphasizes both exploratory and inferential statistics; a new chapter on Variable Selection (Lasso, SCAD and Elastic Net) has also been added. All chapters include practical exercises that highlight applications in different multivariate data analysis fields: in quantitative financial studies, where the joint dynamics of assets are observed; in medicine, where recorded observations of subjects in different locations form the basis for reliable diagnoses and medication; and in quantitative marketing, where consumers’ preferences are collected in order to construct models of consumer behavior. All of these examples involve high to ultra-high dimensions and represent a number of major fields in big data analysis. The fourth edition of this book on Applied Multivariate ...
Abbas Alkarkhi, F M; Ismail, Norli; Easa, Azhar Mat
2008-02-11
Cockles (Anadara granosa) sample obtained from two rivers in the Penang State of Malaysia were analyzed for the content of arsenic (As) and heavy metals (Cr, Cd, Zn, Cu, Pb, and Hg) using a graphite flame atomic absorption spectrometer (GF-AAS) for Cr, Cd, Zn, Cu, Pb, As and cold vapor atomic absorption spectrometer (CV-AAS) for Hg. The two locations of interest with 20 sampling points of each location were Kuala Juru (Juru River) and Bukit Tambun (Jejawi River). Multivariate statistical techniques such as multivariate analysis of variance (MANOVA) and discriminant analysis (DA) were applied for analyzing the data. MANOVA showed a strong significant difference between the two rivers in term of As and heavy metals contents in cockles. DA gave the best result to identify the relative contribution for all parameters in discriminating (distinguishing) the two rivers. It provided an important data reduction as it used only two parameters (Zn and Cd) affording more than 72% correct assignations. Results indicated that the two rivers were different in terms of As and heavy metal contents in cockle, and the major difference was due to the contribution of Zn and Cd. A positive correlation was found between discriminate functions (DF) and Zn, Cd and Cr, whereas negative correlation was exhibited with other heavy metals. Therefore, DA allowed a reduction in the dimensionality of the data set, delineating a few indicator parameters responsible for large variations in heavy metals and arsenic content. Taking into account of these results, it can be suggested that a continuous monitoring of As and heavy metals in cockles be performed in these two rivers.
Energy Technology Data Exchange (ETDEWEB)
Park, Jinyong [Univ. of Arizona, Tucson, AZ (United States); Balasingham, P [Univ. of Arizona, Tucson, AZ (United States); McKenna, Sean Andrew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kulatilake, Pinnaduwa H.S.W. [Univ. of Arizona, Tucson, AZ (United States)
2004-09-01
Sandia National Laboratories, under contract to Nuclear Waste Management Organization of Japan (NUMO), is performing research on regional classification of given sites in Japan with respect to potential volcanic disruption using multivariate statistics and geo-statistical interpolation techniques. This report provides results obtained for hierarchical probabilistic regionalization of volcanism for the Sengan region in Japan by applying multivariate statistical techniques and geostatistical interpolation techniques on the geologic data provided by NUMO. A workshop report produced in September 2003 by Sandia National Laboratories (Arnold et al., 2003) on volcanism lists a set of most important geologic variables as well as some secondary information related to volcanism. Geologic data extracted for the Sengan region in Japan from the data provided by NUMO revealed that data are not available at the same locations for all the important geologic variables. In other words, the geologic variable vectors were found to be incomplete spatially. However, it is necessary to have complete geologic variable vectors to perform multivariate statistical analyses. As a first step towards constructing complete geologic variable vectors, the Universal Transverse Mercator (UTM) zone 54 projected coordinate system and a 1 km square regular grid system were selected. The data available for each geologic variable on a geographic coordinate system were transferred to the aforementioned grid system. Also the recorded data on volcanic activity for Sengan region were produced on the same grid system. Each geologic variable map was compared with the recorded volcanic activity map to determine the geologic variables that are most important for volcanism. In the regionalized classification procedure, this step is known as the variable selection step. The following variables were determined as most important for volcanism: geothermal gradient, groundwater temperature, heat discharge, groundwater
Directory of Open Access Journals (Sweden)
Vujović Svetlana R.
2013-01-01
Full Text Available This paper illustrates the utility of multivariate statistical techniques for analysis and interpretation of water quality data sets and identification of pollution sources/factors with a view to get better information about the water quality and design of monitoring network for effective management of water resources. Multivariate statistical techniques, such as factor analysis (FA/principal component analysis (PCA and cluster analysis (CA, were applied for the evaluation of variations and for the interpretation of a water quality data set of the natural water bodies obtained during 2010 year of monitoring of 13 parameters at 33 different sites. FA/PCA attempts to explain the correlations between the observations in terms of the underlying factors, which are not directly observable. Factor analysis is applied to physico-chemical parameters of natural water bodies with the aim classification and data summation as well as segmentation of heterogeneous data sets into smaller homogeneous subsets. Factor loadings were categorized as strong and moderate corresponding to the absolute loading values of >0.75, 0.75-0.50, respectively. Four principal factors were obtained with Eigenvalues >1 summing more than 78 % of the total variance in the water data sets, which is adequate to give good prior information regarding data structure. Each factor that is significantly related to specific variables represents a different dimension of water quality. The first factor F1 accounting for 28 % of the total variance and represents the hydrochemical dimension of water quality. The second factor F2 accounting for 18% of the total variance and may be taken factor of water eutrophication. The third factor F3 accounting 17 % of the total variance and represents the influence of point sources of pollution on water quality. The fourth factor F4 accounting 13 % of the total variance and may be taken as an ecological dimension of water quality. Cluster analysis (CA is an
Tay, C. K.; Hayford, E. K.; Hodgson, I. O. A.
2017-02-01
Multivariate statistical technique and hydrogeochemical approach were employed for groundwater assessment within the Lower Pra Basin. The main objective was to delineate the main processes that are responsible for the water chemistry and pollution of groundwater within the basin. Fifty-four (54) (No) boreholes were sampled in January 2012 for quality assessment. PCA using Varimax with Kaiser Normalization method of extraction for both rotated space and component matrix have been applied to the data. Results show that Spearman's correlation matrix of major ions revealed expected process-based relationships derived mainly from the geochemical processes, such as ion-exchange and silicate/aluminosilicate weathering within the aquifer. Three main principal components influence the water chemistry and pollution of groundwater within the basin. The three principal components have accounted for approximately 79% of the total variance in the hydrochemical data. Component 1 delineates the main natural processes (water-soil-rock interactions) through which groundwater within the basin acquires its chemical characteristics, Component 2 delineates the incongruent dissolution of silicate/aluminosilicates, while Component 3 delineates the prevalence of pollution principally from agricultural input as well as trace metal mobilization in groundwater within the basin. The loadings and score plots of the first two PCs show grouping pattern which indicates the strength of the mutual relation among the hydrochemical variables. In terms of proper management and development of groundwater within the basin, communities, where intense agriculture is taking place, should be monitored and protected from agricultural activities. especially where inorganic fertilizers are used by creating buffer zones. Monitoring of the water quality especially the water pH is recommended to ensure the acid neutralizing potential of groundwater within the basin thereby, curtailing further trace metal
Tay, C. K.; Hayford, E. K.; Hodgson, I. O. A.
2017-06-01
Multivariate statistical technique and hydrogeochemical approach were employed for groundwater assessment within the Lower Pra Basin. The main objective was to delineate the main processes that are responsible for the water chemistry and pollution of groundwater within the basin. Fifty-four (54) (No) boreholes were sampled in January 2012 for quality assessment. PCA using Varimax with Kaiser Normalization method of extraction for both rotated space and component matrix have been applied to the data. Results show that Spearman's correlation matrix of major ions revealed expected process-based relationships derived mainly from the geochemical processes, such as ion-exchange and silicate/aluminosilicate weathering within the aquifer. Three main principal components influence the water chemistry and pollution of groundwater within the basin. The three principal components have accounted for approximately 79% of the total variance in the hydrochemical data. Component 1 delineates the main natural processes (water-soil-rock interactions) through which groundwater within the basin acquires its chemical characteristics, Component 2 delineates the incongruent dissolution of silicate/aluminosilicates, while Component 3 delineates the prevalence of pollution principally from agricultural input as well as trace metal mobilization in groundwater within the basin. The loadings and score plots of the first two PCs show grouping pattern which indicates the strength of the mutual relation among the hydrochemical variables. In terms of proper management and development of groundwater within the basin, communities, where intense agriculture is taking place, should be monitored and protected from agricultural activities. especially where inorganic fertilizers are used by creating buffer zones. Monitoring of the water quality especially the water pH is recommended to ensure the acid neutralizing potential of groundwater within the basin thereby, curtailing further trace metal
Marinović Ruždjak, Andrea; Ruždjak, Domagoj
2015-04-01
For the evaluation of seasonal and spatial variations and the interpretation of a large and complex water quality dataset obtained during a 7-year monitoring program of the Sava River in Croatia, different multivariate statistical techniques were applied in this study. Basic statistical properties and correlations of 18 water quality parameters (variables) measured at 18 sampling sites (a total of 56,952 values) were examined. Correlations between air temperature and some water quality parameters were found in agreement with the previous studies of relationship between climatic and hydrological parameters. Principal component analysis (PCA) was used to explore the most important factors determining the spatiotemporal dynamics of the Sava River. PCA has determined a reduced number of seven principal components that explain over 75 % of the data set variance. The results revealed that parameters related to temperature and organic pollutants (CODMn and TSS) were the most important parameters contributing to water quality variation. PCA analysis of seasonal subsets confirmed this result and showed that the importance of parameters is changing from season to season. PCA of the four seasonal data subsets yielded six PCs with eigenvalues greater than one explaining 73.6 % (spring), 71.4 % (summer), 70.3 % (autumn), and 71.3 % (winter) of the total variance. To check the influence of the outliers in the data set whose distribution strongly deviates from the normal one, in addition to standard principal component analysis algorithm, two robust estimates of covariance matrix were calculated and subjected to PCA. PCA in both cases yielded seven principal components explaining 75 % of the total variance, and the results do not differ significantly from the results obtained by the standard PCA algorithm. With the implementation of robust PCA algorithm, it is demonstrated that the usage of standard algorithm is justified for data sets with small numbers of missing data
Directory of Open Access Journals (Sweden)
M.A. Delavar
2016-02-01
Full Text Available Introduction: The accumulation of heavy metals (HMs in the soil is of increasing concern due to food safety issues, potential health risks, and the detrimental effects on soil ecosystems. HMs may be considered as the most important soil pollutants, because they are not biodegradable and their physical movement through the soil profile is relatively limited. Therefore, root uptake process may provide a big chance for these pollutants to transfer from the surface soil to natural and cultivated plants, which may eventually steer them to human bodies. The general behavior of HMs in the environment, especially their bioavailability in the soil, is influenced by their origin. Hence, source apportionment of HMs may provide some essential information for better management of polluted soils to restrict the HMs entrance to the human food chain. This paper explores the applicability of multivariate statistical techniques in the identification of probable sources that can control the concentration and distribution of selected HMs in the soils surrounding the Zanjan Zinc Specialized Industrial Town (briefly Zinc Town. Materials and Methods: The area under investigation has a size of approximately 4000 ha.It is located around the Zinc Town, Zanjan province. A regular grid sampling pattern with an interval of 500 meters was applied to identify the sample location, and 184 topsoil samples (0-10 cm were collected. The soil samples were air-dried and sieved through a 2 mm polyethylene sieve and then, were digested using HNO3. The total concentrations of zinc (Zn, lead (Pb, cadmium (Cd, Nickel (Ni and copper (Cu in the soil solutions were determined via Atomic Absorption Spectroscopy (AAS. Data were statistically analyzed using the SPSS software version 17.0 for Windows. Correlation Matrix (CM, Principal Component Analyses (PCA and Factor Analyses (FA techniques were performed in order to identify the probable sources of HMs in the studied soils. Results and
Benson, Nsikak U; Asuquo, Francis E; Williams, Akan B; Essien, Joseph P; Ekong, Cyril I; Akpabio, Otobong; Olajire, Abaas A
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources.
Directory of Open Access Journals (Sweden)
Nsikak U Benson
Full Text Available Trace metals (Cd, Cr, Cu, Ni and Pb concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria. The degree of contamination was assessed using the individual contamination factors (ICF and global contamination factor (GCF. Multivariate statistical approaches including principal component analysis (PCA, cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
Enlow, Elizabeth M; Kennedy, Jennifer L; Nieuwland, Alexander A; Hendrix, James E; Morgan, Stephen L
2005-08-01
Nylons are an important class of synthetic polymers, from an industrial, as well as forensic, perspective. A spectroscopic method, such as Fourier transform infrared (FT-IR) spectroscopy, is necessary to determine the nylon subclasses (e. g., nylon 6 or nylon 6,6). Library searching using absolute difference and absolute derivative difference algorithms gives inconsistent results for identifying nylon subclasses. The objective of this study was to evaluate the usefulness of peak ratio analysis and multivariate statistics for the identification of nylon subclasses using attenuated total reflection (ATR) spectral data. Many nylon subclasses could not be distinguished by the peak ratio of the N-H vibrational stretch to the sp(3) C-H(2) vibrational stretch intensities. Linear discriminant analysis, however, provided a graphical visualization of differences between nylon subclasses and was able to correctly classify a set of 270 spectra from eight different subclasses with 98.5% cross-validated accuracy.
Directory of Open Access Journals (Sweden)
Qing Gu
2016-03-01
Full Text Available Qiandao Lake (Xin’an Jiang reservoir plays a significant role in drinking water supply for eastern China, and it is an attractive tourist destination. Three multivariate statistical methods were comprehensively applied to assess the spatial and temporal variations in water quality as well as potential pollution sources in Qiandao Lake. Data sets of nine parameters from 12 monitoring sites during 2010–2013 were obtained for analysis. Cluster analysis (CA was applied to classify the 12 sampling sites into three groups (Groups A, B and C and the 12 monitoring months into two clusters (April-July, and the remaining months. Discriminant analysis (DA identified Secchi disc depth, dissolved oxygen, permanganate index and total phosphorus as the significant variables for distinguishing variations of different years, with 79.9% correct assignments. Dissolved oxygen, pH and chlorophyll-a were determined to discriminate between the two sampling periods classified by CA, with 87.8% correct assignments. For spatial variation, DA identified Secchi disc depth and ammonia nitrogen as the significant discriminating parameters, with 81.6% correct assignments. Principal component analysis (PCA identified organic pollution, nutrient pollution, domestic sewage, and agricultural and surface runoff as the primary pollution sources, explaining 84.58%, 81.61% and 78.68% of the total variance in Groups A, B and C, respectively. These results demonstrate the effectiveness of integrated use of CA, DA and PCA for reservoir water quality evaluation and could assist managers in improving water resources management.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Directory of Open Access Journals (Sweden)
Gledsneli Maria Lima Lins
2010-12-01
Full Text Available Water has a decisive influence on populations’ life quality – specifically in areas like urban supply, drainage, and effluents treatment – due to its sound impact over public health. Water rational use constitutes the greatest challenge faced by water demand management, mainly with regard to urban household water consumption. This makes it important to develop researches to assist water managers and public policy-makers in planning and formulating water demand measures which may allow urban water rational use to be met. This work utilized the multivariate techniques Factor Analysis and Multiple Linear Regression Analysis – in order to determine the participation level of socioeconomic and climatic variables in monthly urban household consumption changes – applying them to two districts of Campina Grande city (State of Paraíba, Brazil. The districts were chosen based on socioeconomic criterion (income level so as to evaluate their water consumer’s behavior. A 9-year monthly data series (from year 2000 up to 2008 was utilized, comprising family income, water tariff, and quantity of household connections (economies – as socioeconomic variables – and average temperature and precipitation, as climatic variables. For both the selected districts of Campina Grande city, the obtained results point out the variables “water tariff” and “family income” as indicators of these district’s household consumption.
Directory of Open Access Journals (Sweden)
Farooq Ahmad
2011-12-01
Full Text Available Multivariate statistical techniques such as factor analysis (FA, cluster analysis (CA and discriminant analysis (DA, were applied for the evaluation of spatial variations and the interpretation of a large complex water quality data set of three cities (Lahore, Gujranwala and Sialkot in Punjab, Pakistan. 16 parameters of water samples collected from nine different sampling stations of each city were determined. Factor analysis indicates five factors, which explained 74% of the total variance in water quality data set. Five factors are salinization, alkalinity, temperature, domestic waste and chloride, which explained 31.1%, 14.3%, 10.6%, 10.0% and 8.0% of the total variance respectively. Hierarchical cluster analysis grouped nine sampling stations of each city into three clusters, i.e., relatively less polluted (LP, and moderately polluted (MP and highly polluted (HP sites, based on the similarity of water quality characteristics. Discriminant analysis (DA identified ten significant parameters (Calcium (Ca, Ammonia, Sulphate, Sodium (Na, electrical conductivity (EC, chloride, temperature (Temp, total hardness(TH, Turbidity, which discriminate the groundwater quality of three cities, with close to 100.0% correct assignment for spatial variations. This study illustrates the benefit of multivariate statistical techniques for interpreting complex data sets in the analysis of spatial variations in water quality, and to plan for future studies.
Multivariate statistics exercises and solutions
Härdle, Wolfgang Karl
2015-01-01
The authors present tools and concepts of multivariate data analysis by means of exercises and their solutions. The first part is devoted to graphical techniques. The second part deals with multivariate random variables and presents the derivation of estimators and tests for various practical situations. The last part introduces a wide variety of exercises in applied multivariate data analysis. The book demonstrates the application of simple calculus and basic multivariate methods in real life situations. It contains altogether more than 250 solved exercises which can assist a university teacher in setting up a modern multivariate analysis course. All computer-based exercises are available in the R language. All R codes and data sets may be downloaded via the quantlet download center www.quantlet.org or via the Springer webpage. For interactive display of low-dimensional projections of a multivariate data set, we recommend GGobi.
Applied multivariate statistics with R
Zelterman, Daniel
2015-01-01
This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...
Institute of Scientific and Technical Information of China (English)
郑力燕; 于宏兵; 王启山
2016-01-01
Multivariate statistical techniques, such as cluster analysis (CA), discriminant analysis (DA), principal component analysis (PCA) and factor analysis (FA), were applied to evaluate and interpret the surface water quality data sets of the Second Songhua River (SSHR) basin in China, obtained during two years (2012−2013) of monitoring of 10 physicochemical parameters at 15 different sites. The results showed that most of physicochemical parameters varied significantly among the sampling sites. Three significant groups, highly polluted (HP), moderately polluted (MP) and less polluted (LP), of sampling sites were obtained through Hierarchical agglomerative CA on the basis of similarity of water quality characteristics. DA identified pH, F, DO, NH3-N, COD and VPhs were the most important parameters contributing to spatial variations of surface water quality. However, DA did not give a considerable data reduction (40%reduction). PCA/FA resulted in three, three and four latent factors explaining 70%, 62%and 71%of the total variance in water quality data sets of HP, MP and LP regions, respectively. FA revealed that the SSHR water chemistry was strongly affected by anthropogenic activities (point sources: industrial effluents and wastewater treatment plants; non-point sources:domestic sewage, livestock operations and agricultural activities) and natural processes (seasonal effect, and natural inputs). PCA/FA in the whole basin showed the best results for data reduction because it used only two parameters (about 80%reduction) as the most important parameters to explain 72%of the data variation. Thus, this work illustrated the utility of multivariate statistical techniques for analysis and interpretation of datasets and, in water quality assessment, identification of pollution sources/factors and understanding spatial variations in water quality for effective stream water quality management.
Directory of Open Access Journals (Sweden)
Z. Pasandidehfard
2014-03-01
Full Text Available Nonpoint source (NPS pollution is a major surface water contaminant commonly caused by agricultural runoff. The purpose of this study was to assess seasonal variation in water quality parameters in Gorganrood watershed (Golestan Province, Iran. It also tried to clarify the effects of agricultural practices and NPS pollution on them. Water quality parameters including potassium, sodium, pH, water flow rate, total dissolved solids (TDS, electrical conductivity (EC, hardness, sulfate, bicarbonate, chlorine, magnesium, and calcium ions during 1966-2010 were evaluated using multivariate statistical techniques. Multivariate analysis of variance (MANOVA was implemented to determine the significance of differences between mean seasonal values. Discriminant analysis (DA was also carried out to identify correlations between seasons and the water quality parameters. Parameters of water quality index were measured through principal component analysis (PCA and factor analysis (FA. Based on the results of statistical tests, climate (freezing, weathering and rainfall and human activities such as agriculture had crucial effects on water quality. The most important parameters in differentiation between seasons in descending order were potassium, pH, carbonic acid, calcium, and magnesium. According to load factor analysis, chlorine, calcium, and potassium were the most important parameters in spring and summer, indicating the application of fertilizers (especially potassium chloride fertilizer and existence of NPS pollution during these seasons. In the next stage, the months during which crops had excessive water requirements were detected using CROPWAT software. Almost all water requirements of the area’s major crops, i.e. cotton, rice, soya, wheat, and oat, happen in the late spring until mid/late summer. According to our findings, agricultural practices had a great impact on water pollution. Results of analysis with CROPWAT software also confirmed this
Muhammad, Said; Tahir Shah, M; Khan, Sardar
2010-10-01
The present study was conducted in Kohistan region, where mafic and ultramafic rocks (Kohistan island arc and Indus suture zone) and metasedimentary rocks (Indian plate) are exposed. Water samples were collected from the springs, streams and Indus river and analyzed for physical parameters, anions, cations and arsenic (As(3+), As(5+) and arsenic total). The water quality in Kohistan region was evaluated by comparing the physio-chemical parameters with permissible limits set by Pakistan environmental protection agency and world health organization. Most of the studied parameters were found within their respective permissible limits. However in some samples, the iron and arsenic concentrations exceeded their permissible limits. For health risk assessment of arsenic, the average daily dose, hazards quotient (HQ) and cancer risk were calculated by using statistical formulas. The values of HQ were found >1 in the samples collected from Jabba, Dubair, while HQ values were samples. This level of contamination should have low chronic risk and medium cancer risk when compared with US EPA guidelines. Furthermore, the inter-dependence of physio-chemical parameters and pollution load was also calculated by using multivariate statistical techniques like one-way ANOVA, correlation analysis, regression analysis, cluster analysis and principle component analysis.
Khan, Firdos; Pilz, Jürgen
2016-04-01
South Asia is under the severe impacts of changing climate and global warming. The last two decades showed that climate change or global warming is happening and the first decade of 21st century is considered as the warmest decade over Pakistan ever in history where temperature reached 53 0C in 2010. Consequently, the spatio-temporal distribution and intensity of precipitation is badly effected and causes floods, cyclones and hurricanes in the region which further have impacts on agriculture, water, health etc. To cope with the situation, it is important to conduct impact assessment studies and take adaptation and mitigation remedies. For impact assessment studies, we need climate variables at higher resolution. Downscaling techniques are used to produce climate variables at higher resolution; these techniques are broadly divided into two types, statistical downscaling and dynamical downscaling. The target location of this study is the monsoon dominated region of Pakistan. One reason for choosing this area is because the contribution of monsoon rains in this area is more than 80 % of the total rainfall. This study evaluates a statistical downscaling technique which can be then used for downscaling climatic variables. Two statistical techniques i.e. quantile regression and copula modeling are combined in order to produce realistic results for climate variables in the area under-study. To reduce the dimension of input data and deal with multicollinearity problems, empirical orthogonal functions will be used. Advantages of this new method are: (1) it is more robust to outliers as compared to ordinary least squares estimates and other estimation methods based on central tendency and dispersion measures; (2) it preserves the dependence among variables and among sites and (3) it can be used to combine different types of distributions. This is important in our case because we are dealing with climatic variables having different distributions over different meteorological
Aspects of multivariate statistical theory
Muirhead, Robb J
2009-01-01
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "". . . the wealth of material on statistics concerning the multivariate normal distribution is quite exceptional. As such it is a very useful source of information for the general statistician and a must for anyone wanting to pen
RF Calibration of On-Chip DfT Chain by DC Stimuli and Statistical Multivariate Regression Technique
Ramzan, Rashad; Dabrowski, Jerzy
2015-01-01
The problem of parameter variability in RF and analog circuits is escalating with CMOS scaling. Consequently every RF chip produced in nano-meter CMOS technologies needs to be tested. On-chip Design for Testability (DfT) features, which are meant to reduce test time and cost also suffer from parameter variability. Therefore, RF calibration of all on-chip test structures is mandatory. In this paper, Artificial Neural Networks (ANN) are employed as a multivariate regression technique to archite...
Multivariate statistical methods a primer
Manly, Bryan FJ
2004-01-01
THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Directory of Open Access Journals (Sweden)
Cláudio Roberto Rosário
2012-07-01
Full Text Available The purpose of this research is to improve the practice on customer satisfaction analysis The article presents an analysis model to analyze the answers of a customer satisfaction evaluation in a systematic way with the aid of multivariate statistical techniques, specifically, exploratory analysis with PCA – Partial Components Analysis with HCA - Hierarchical Cluster Analysis. It was tried to evaluate the applicability of the model to be used by the issue company as a tool to assist itself on identifying the value chain perceived by the customer when applied the questionnaire of customer satisfaction. It was found with the assistance of multivariate statistical analysis that it was observed similar behavior among customers. It also allowed the company to conduct reviews on questions of the questionnaires, using analysis of the degree of correlation between the questions that was not a company’s practice before this research.
Herojeet, Rajkumar; Rishi, Madhuri S.; Lata, Renu; Dolma, Konchok
2017-09-01
multivariate techniques for reliable quality characterization of surface water quality to develop effective pollution reduction strategies and maintain a fine balance between the industrialization and ecological integrity.
Energy Technology Data Exchange (ETDEWEB)
Aguado Garcia, D.; Ferrer Riquelme, A. J.; Seco Torrecillas, A.; Ferrer Polo, J.
2006-07-01
Due to the increasingly stringent effluents quality requirements imposed by the regulations, monitoring wastewater treatment plants (WWTP) becomes extremely important in order to achieve efficient process operations. Nowadays, at modern WWTP large number of online process variables are collected and these variable are usually highly correlated. Therefore, appropriate techniques are required to extract the information from the huge amount of collected data. In this work, the application of multivariate statistical projection techniques is presented as an effective strategy for monitoring a sequencing batch reactor (SBR) operated for enhanced biological phosphorus removal. (Author)
Multivariate statistical methods a first course
Marcoulides, George A
2014-01-01
Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Bouderbala, Abdelkader; Remini, Boualem; Saaed Hamoudi, Abdelamir; Pulido-Bosch, Antonio
2016-06-01
The study focuses on the characterization of the groundwater salinity on the Nador coastal aquifer (Algeria). The groundwater quality has undergone serious deterioration due to overexploitation. Groundwater samplings were carried out in high and low waters in 2013, in order to study the evolution of groundwater hydrochemistry from the recharge to the coastal area. Different kinds of statistical analysis were made in order to identify the main hydrogeochemical processes occurring in the aquifer and to discriminate between different groups of groundwater. These statistical methods provide a better understanding of the aquifer hydrochem-istry, and put in evidence a hydrochemical classification of wells, showing that the area with higher salinity is located close to the coast, in the first two kilometers, where the salinity gradually increases as one approaches the seaside and suggests the groundwater salinization by sea-water intrusion.
Directory of Open Access Journals (Sweden)
Bouderbala Abdelkader
2016-06-01
Full Text Available The study focuses on the characterization of the groundwater salinity on the Nador coastal aquifer (Algeria. The groundwater quality has undergone serious deterioration due to overexploitation. Groundwater samplings were carried out in high and low waters in 2013, in order to study the evolution of groundwater hydrochemistry from the recharge to the coastal area. Different kinds of statistical analysis were made in order to identify the main hydrogeochemical processes occurring in the aquifer and to discriminate between different groups of groundwater. These statistical methods provide a better understanding of the aquifer hydrochemistry, and put in evidence a hydrochemical classification of wells, showing that the area with higher salinity is located close to the coast, in the first two kilometers, where the salinity gradually increases as one approaches the seaside and suggests the groundwater salinization by seawater intrusion.
El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz
2017-10-01
Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock-water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. Copyright © 2017 Elsevier Ltd. All rights reserved.
Schmidt decomposition and multivariate statistical analysis
Bogdanov, Yu. I.; Bogdanova, N. A.; Fastovets, D. V.; Luckichev, V. F.
2016-12-01
The new method of multivariate data analysis based on the complements of classical probability distribution to quantum state and Schmidt decomposition is presented. We considered Schmidt formalism application to problems of statistical correlation analysis. Correlation of photons in the beam splitter output channels, when input photons statistics is given by compound Poisson distribution is examined. The developed formalism allows us to analyze multidimensional systems and we have obtained analytical formulas for Schmidt decomposition of multivariate Gaussian states. It is shown that mathematical tools of quantum mechanics can significantly improve the classical statistical analysis. The presented formalism is the natural approach for the analysis of both classical and quantum multivariate systems and can be applied in various tasks associated with research of dependences.
Guimarães Nobre, Gabriela; Arnbjerg-Nielsen, Karsten; Rosbjerg, Dan; Madsen, Henrik
2016-04-01
Traditionally, flood risk assessment studies have been carried out from a univariate frequency analysis perspective. However, statistical dependence between hydrological variables, such as extreme rainfall and extreme sea surge, is plausible to exist, since both variables to some extent are driven by common meteorological conditions. Aiming to overcome this limitation, multivariate statistical techniques has the potential to combine different sources of flooding in the investigation. The aim of this study was to apply a range of statistical methodologies for analyzing combined extreme hydrological variables that can lead to coastal and urban flooding. The study area is the Elwood Catchment, which is a highly urbanized catchment located in the city of Port Phillip, Melbourne, Australia. The first part of the investigation dealt with the marginal extreme value distributions. Two approaches to extract extreme value series were applied (Annual Maximum and Partial Duration Series), and different probability distribution functions were fit to the observed sample. Results obtained by using the Generalized Pareto distribution demonstrate the ability of the Pareto family to model the extreme events. Advancing into multivariate extreme value analysis, first an investigation regarding the asymptotic properties of extremal dependence was carried out. As a weak positive asymptotic dependence between the bivariate extreme pairs was found, the Conditional method proposed by Heffernan and Tawn (2004) was chosen. This approach is suitable to model bivariate extreme values, which are relatively unlikely to occur together. The results show that the probability of an extreme sea surge occurring during a one-hour intensity extreme precipitation event (or vice versa) can be twice as great as what would occur when assuming independent events. Therefore, presuming independence between these two variables would result in severe underestimation of the flooding risk in the study area.
Velasco-Tapia, Fernando
2014-01-01
Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).
Directory of Open Access Journals (Sweden)
Fernando Velasco-Tapia
2014-01-01
Full Text Available Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC volcanic range (Mexican Volcanic Belt. In this locality, the volcanic activity (3.7 to 0.5 Ma was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward’s linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas in the comingled lavas (binary mixtures.
Hamchevici, Carmen; Udrea, Ion
2013-11-01
The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.
Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi
2015-03-15
Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate
Multivariate analysis: A statistical approach for computations
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Multivariate statistical analysis of wildfires in Portugal
Costa, Ricardo; Caramelo, Liliana; Pereira, Mário
2013-04-01
Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).
Random matrix theory and multivariate statistics
Diaz-Garcia, Jose A.; Jáimez, Ramon Gutiérrez
2009-01-01
Some tools and ideas are interchanged between random matrix theory and multivariate statistics. In the context of the random matrix theory, classes of spherical and generalised Wishart random matrix ensemble, containing as particular cases the classical random matrix ensembles, are proposed. Some properties of these classes of ensemble are analysed. In addition, the random matrix ensemble approach is extended and a unified theory proposed for the study of distributions for real normed divisio...
Multivariate methods and forecasting with IBM SPSS statistics
Aljandali, Abdulkader
2017-01-01
This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...
Hong, Haoyuan; Pourghasemi, Hamid Reza; Pourtaghi, Zohre Sadat
2016-04-01
Landslides are an important natural hazard that causes a great amount of damage around the world every year, especially during the rainy season. The Lianhua area is located in the middle of China's southern mountainous area, west of Jiangxi Province, and is known to be an area prone to landslides. The aim of this study was to evaluate and compare landslide susceptibility maps produced using the random forest (RF) data mining technique with those produced by bivariate (evidential belief function and frequency ratio) and multivariate (logistic regression) statistical models for Lianhua County, China. First, a landslide inventory map was prepared using aerial photograph interpretation, satellite images, and extensive field surveys. In total, 163 landslide events were recognized in the study area, with 114 landslides (70%) used for training and 49 landslides (30%) used for validation. Next, the landslide conditioning factors-including the slope angle, altitude, slope aspect, topographic wetness index (TWI), slope-length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, distance to roads, annual precipitation, land use, normalized difference vegetation index (NDVI), and lithology-were derived from the spatial database. Finally, the landslide susceptibility maps of Lianhua County were generated in ArcGIS 10.1 based on the random forest (RF), evidential belief function (EBF), frequency ratio (FR), and logistic regression (LR) approaches and were validated using a receiver operating characteristic (ROC) curve. The ROC plot assessment results showed that for landslide susceptibility maps produced using the EBF, FR, LR, and RF models, the area under the curve (AUC) values were 0.8122, 0.8134, 0.7751, and 0.7172, respectively. Therefore, we can conclude that all four models have an AUC of more than 0.70 and can be used in landslide susceptibility mapping in the study area; meanwhile, the EBF and FR models had the best performance for Lianhua
Multivariate postprocessing techniques for probabilistic hydrological forecasting
Hemri, Stephan; Lisniak, Dmytro; Klein, Bastian
2016-04-01
Hydrologic ensemble forecasts driven by atmospheric ensemble prediction systems need statistical postprocessing in order to account for systematic errors in terms of both mean and spread. Runoff is an inherently multivariate process with typical events lasting from hours in case of floods to weeks or even months in case of droughts. This calls for multivariate postprocessing techniques that yield well calibrated forecasts in univariate terms and ensure a realistic temporal dependence structure at the same time. To this end, the univariate ensemble model output statistics (EMOS; Gneiting et al., 2005) postprocessing method is combined with two different copula approaches that ensure multivariate calibration throughout the entire forecast horizon. These approaches comprise ensemble copula coupling (ECC; Schefzik et al., 2013), which preserves the dependence structure of the raw ensemble, and a Gaussian copula approach (GCA; Pinson and Girard, 2012), which estimates the temporal correlations from training observations. Both methods are tested in a case study covering three subcatchments of the river Rhine that represent different sizes and hydrological regimes: the Upper Rhine up to the gauge Maxau, the river Moselle up to the gauge Trier, and the river Lahn up to the gauge Kalkofen. The results indicate that both ECC and GCA are suitable for modelling the temporal dependences of probabilistic hydrologic forecasts (Hemri et al., 2015). References Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman (2005), Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Monthly Weather Review, 133(5), 1098-1118, DOI: 10.1175/MWR2904.1. Hemri, S., D. Lisniak, and B. Klein, Multivariate postprocessing techniques for probabilistic hydrological forecasting, Water Resources Research, 51(9), 7436-7451, DOI: 10.1002/2014WR016473. Pinson, P., and R. Girard (2012), Evaluating the quality of scenarios of short-term wind power
Zghibi, Adel; Merzougui, Amira; Zouhri, Lahcen; Tarhouni, Jamila
2014-01-01
the dissolution of gypsum, dolomite and halite, as well as contamination by nitrate caused mainly by extensive irrigation activity. The application of Multivariate Statistics Techniques based on Principal component Analysis and Hierarchical Cluster Analysis has lead to the corroboration of the hypotheses developed from the previous hydrochemical study. Two factors were found that explained major hydrochemical processes in the aquifer. These factors reveal the existence of an intensive intrusion of seawater and mechanisms of nitrate contamination of groundwater.
Optimization techniques in statistics
Rustagi, Jagdish S
1994-01-01
Statistics help guide us to optimal decisions under uncertainty. A large variety of statistical problems are essentially solutions to optimization problems. The mathematical techniques of optimization are fundamentalto statistical theory and practice. In this book, Jagdish Rustagi provides full-spectrum coverage of these methods, ranging from classical optimization and Lagrange multipliers, to numerical techniques using gradients or direct search, to linear, nonlinear, and dynamic programming using the Kuhn-Tucker conditions or the Pontryagin maximal principle. Variational methods and optimiza
Multivariate statistical analysis of precipitation chemistry in Northwestern Spain
Energy Technology Data Exchange (ETDEWEB)
Prada-Sanchez, J.M.; Garcia-Jurado, I.; Gonzalez-Manteiga, W.; Fiestras-Janeiro, M.G.; Espada-Rios, M.I.; Lucas-Dominguez, T. (University of Santiago, Santiago (Spain). Faculty of Mathematics, Dept. of Statistics and Operations Research)
1993-07-01
149 samples of rainwater were collected in the proximity of a power station in northwestern Spain at three rainwater monitoring stations. The resulting data are analyzed using multivariate statistical techniques. Firstly, the Principal Component Analysis shows that there are three main sources of pollution in the area (a marine source, a rural source and an acid source). The impact from pollution from these sources on the immediate environment of the stations is studied using Factorial Discriminant Analysis. 8 refs., 7 figs., 11 tabs.
Kumar, Manoj; Ramanathan, A L; Tripathi, Ritu; Farswan, Sandhya; Kumar, Devendra; Bhattacharya, Prosun
2017-01-01
This study is an investigation on spatio-chemical, contamination sources (using multivariate statistics), and health risk assessment arising from the consumption of groundwater contaminated with trace and toxic elements in the Chhaprola Industrial Area, Gautam Buddha Nagar, Uttar Pradesh, India. In this study 33 tubewell water samples were analyzed for 28 elements using ICP-OES. Concentration of some trace and toxic elements such as Al, As, B, Cd, Cr, Mn, Pb and U exceeded their corresponding WHO (2011) guidelines and BIS (2012) standards while the other analyzed elements remain below than those values. Background γ and β radiation levels were observed and found to be within their acceptable limits. Multivariate statistics PCA (explains 82.07 cumulative percent for total 6 of factors) and CA indicated (mixed origin) that natural and anthropogenic activities like industrial effluent and agricultural runoff are responsible for the degrading of groundwater quality in the research area. In this study area, an adult consumes 3.0 L (median value) of water therefore consuming 39, 1.94, 1461, 0.14, 11.1, 292.6, 13.6, 23.5 μg of Al, As, B, Cd, Cr, Mn, Pb and U from drinking water per day respectively. The hazard quotient (HQ) value exceeded the safe limit of 1 which for As, B, Al, Cr, Mn, Cd, Pb and U at few locations while hazard index (HI) > 5 was observed in about 30% of the samples which indicated potential health risk from these tubewells for the local population if the groundwater is consumed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Multivariate Statistical Process Control Process Monitoring Methods and Applications
Ge, Zhiqiang
2013-01-01
Given their key position in the process control industry, process monitoring techniques have been extensively investigated by industrial practitioners and academic control researchers. Multivariate statistical process control (MSPC) is one of the most popular data-based methods for process monitoring and is widely used in various industrial areas. Effective routines for process monitoring can help operators run industrial processes efficiently at the same time as maintaining high product quality. Multivariate Statistical Process Control reviews the developments and improvements that have been made to MSPC over the last decade, and goes on to propose a series of new MSPC-based approaches for complex process monitoring. These new methods are demonstrated in several case studies from the chemical, biological, and semiconductor industrial areas. Control and process engineers, and academic researchers in the process monitoring, process control and fault detection and isolation (FDI) disciplines will be inter...
Affum, Andrews Obeng; Osae, Shiloh Dede; Nyarko, Benjamin Jabez Botwe; Afful, Samuel; Fianko, Joseph Richmond; Akiti, Tetteh Thomas; Adomako, Dickson; Acquaah, Samuel Osafo; Dorleku, Micheal; Antoh, Emmanuel; Barnes, Felix; Affum, Enoch Acheampong
2015-02-01
In recent times, surface water resource in the Western Region of Ghana has been found to be inadequate in supply and polluted by various anthropogenic activities. As a result of these problems, the demand for groundwater by the human populations in the peri-urban communities for domestic, municipal and irrigation purposes has increased without prior knowledge of its water quality. Water samples were collected from 14 public hand-dug wells during the rainy season in 2013 and investigated for total coliforms, Escherichia coli, mercury (Hg), arsenic (As), cadmium (Cd) and physicochemical parameters. Multivariate statistical analysis of the dataset and a linear stoichiometric plot of major ions were applied to group the water samples and to identify the main factors and sources of contamination. Hierarchal cluster analysis revealed four clusters from the hydrochemical variables (R-mode) and three clusters in the case of water samples (Q-mode) after z score standardization. Principal component analysis after a varimax rotation of the dataset indicated that the four factors extracted explained 93.3 % of the total variance, which highlighted salinity, toxic elements and hardness pollution as the dominant factors affecting groundwater quality. Cation exchange, mineral dissolution and silicate weathering influenced groundwater quality. The ranking order of major ions was Na(+) > Ca(2+) > K(+) > Mg(2+) and Cl(-) > SO4 (2-) > HCO3 (-). Based on piper plot and the hydrogeology of the study area, sodium chloride (86 %), sodium hydrogen carbonate and sodium carbonate (14 %) water types were identified. Although E. coli were absent in the water samples, 36 % of the wells contained total coliforms (Enterobacter species) which exceeded the WHO guidelines limit of zero colony-forming unit (CFU)/100 mL of drinking water. With the exception of Hg, the concentration of As and Cd in 79 and 43 % of the water samples exceeded the WHO guideline limits of 10 and 3
Evaluation of Meterorite Amono Acid Analysis Data Using Multivariate Techniques
McDonald, G.; Storrie-Lombardi, M.; Nealson, K.
1999-01-01
The amino acid distributions in the Murchison carbonaceous chondrite, Mars meteorite ALH84001, and ice from the Allan Hills region of Antarctica are shown, using a multivariate technique known as Principal Component Analysis (PCA), to be statistically distinct from the average amino acid compostion of 101 terrestrial protein superfamilies.
Tau identification using multivariate techniques in ATLAS
"O'Neil, D C; The ATLAS collaboration
2011-01-01
Tau leptons play an important role in the physics program of the LHC. They are being used in electroweak measurements, in detector related studies and in searches for new phenomena like the Higgs boson or Supersymmetry. Tau objects appear as collimated jets with low track multiplicity. Due to the background from QCD multijet processes, efficient tau identification techniques with large fake rejection are essential. Since single variable criteria are not enough to efficiently separate them from jets and electrons, modern multivariate techniques are used. In ATLAS, several advanced algorithms are applied to identify taus, including a projective likelihood estimator and boosted decision trees. All multivariate methods applied to the ATLAS simulated data perform better than the baseline cut analysis. Their performance is shown using high energy data collected at the ATLAS experiment. The strengths and weaknesses of each technique are also discussed.
Kruger, Uwe
2012-01-01
The development and application of multivariate statistical techniques in process monitoring has gained substantial interest over the past two decades in academia and industry alike. Initially developed for monitoring and fault diagnosis in complex systems, such techniques have been refined and applied in various engineering areas, for example mechanical and manufacturing, chemical, electrical and electronic, and power engineering. The recipe for the tremendous interest in multivariate statistical techniques lies in its simplicity and adaptability for developing monitoring applica
Tau identification using multivariate techniques in ATLAS
O'Neil, D; The ATLAS collaboration
2011-01-01
Tau leptons will play an important role in the physics program at the LHC. They will be used in electroweak measurements and in detector related studies like the determination of the missing transverse energy scale, but also in searches for new phenomena like the Higgs boson or Supersymmetry. Due to the huge background from QCD processes, efficient tau identification techniques with large fake rejection are essential. Tau object appear as collimated jets with low track multiplicity and single variable criteria are not enough to efficiently separate them from jets and electrons. This can be achieved using modern multivariate techniques which make optimal use of all the information available. They are particularly useful when the discriminating variables are not independent and no single variable provides good signal and background separation. In ATLAS several advanced algorithms are applied to identify taus, in particular a projective likelihood estimator and boosted decision trees. All multivariate methods ap...
Multivariate Statistical Analysis Applied in Wine Quality Evaluation
Directory of Open Access Journals (Sweden)
Jieling Zou
2015-08-01
Full Text Available This study applies multivariate statistical approaches to wine quality evaluation. With 27 red wine samples, four factors were identified out of 12 parameters by principal component analysis, explaining 89.06% of the total variance of data. As iterative weights calculated by the BP neural network revealed little difference from weights determined by information entropy method, the latter was chosen to measure the importance of indicators. Weighted cluster analysis performs well in classifying the sample group further into two sub-clusters. The second cluster of red wine samples, compared with its first, was lighter in color, tasted thinner and had fainter bouquet. Weighted TOPSIS method was used to evaluate the quality of wine in each sub-cluster. With scores obtained, each sub-cluster was divided into three grades. On the whole, the quality of lighter red wine was slightly better than the darker category. This study shows the necessity and usefulness of multivariate statistical techniques in both wine quality evaluation and parameter selection.
Tau identification using multivariate techniques in ATLAS
O'Neil, D. C.; ATLAS Collaboration
2012-06-01
Tau leptons play an important role in the physics program of the LHC. They are being used in electroweak measurements, in detector related studies and in searches for new phenomena like the Higgs boson or Supersymmetry. In the detector, tau leptons are reconstructed as collimated jets with low track multiplicity. Due to the background from QCD multijet processes, efficient tau identification techniques with large fake rejection are essential. Since single variable criteria are not enough to efficiently separate them from jets and electrons, modern multivariate techniques are used. In ATLAS, several advanced algorithms are applied to identify taus, including a projective likelihood estimator and boosted decision trees. All multivariate methods applied to the ATLAS simulated data perform better than the baseline cut analysis. Their performance is shown using high energy data collected at the ATLAS experiment. The improvement ranges from a factor of 2 to 5 in rejection for the same efficiency, depending on the selected efficiency operating point and the number of prongs in the tau decay. The strengths and weaknesses of each technique are also discussed.
COSIMA data analysis using multivariate techniques
Directory of Open Access Journals (Sweden)
J. Silén
2014-08-01
Full Text Available We describe how to use multivariate analysis of complex TOF-SIMS spectra introducing the method of random projections. The technique allows us to do full clustering and classification of the measured mass spectra. In this paper we use the tool for classification purposes. The presentation describes calibration experiments of 19 minerals on Ag and Au substrates using positive mode ion spectra. The discrimination between individual minerals gives a crossvalidation Cohen κ for classification of typically about 80%. We intend to use the method as a fast tool to deduce a qualitative similarity of measurements.
Processes and subdivisions in diogenites, a multivariate statistical analysis
Harriott, T. A.; Hewins, R. H.
1984-01-01
Multivariate statistical techniques used on diogenite orthopyroxene analyses show the relationships that occur within diogenites and the two orthopyroxenite components (class I and II) in the polymict diogenite Garland. Cluster analysis shows that only Peckelsheim is similar to Garland class I (Fe-rich) and the other diogenites resemble Garland class II. The unique diogenite Y 75032 may be related to type I by fractionation. Factor analysis confirms the subdivision and shows that Fe does not correlate with the weakly incompatible elements across the entire pyroxene composition range, indicating that igneous fractionation is not the process controlling total diogenite composition variation. The occurrence of two groups of diogenites is interpreted as the result of sampling or mixing of two main sequences of orthopyroxene cumulates with slightly different compositions.
Forensic discrimination of dyed hair color: II. Multivariate statistical analysis.
Barrett, Julie A; Siegel, Jay A; Goodpaster, John V
2011-01-01
This research is intended to assess the ability of UV-visible microspectrophotometry to successfully discriminate the color of dyed hair. Fifty-five red hair dyes were analyzed and evaluated using multivariate statistical techniques including agglomerative hierarchical clustering (AHC), principal component analysis (PCA), and discriminant analysis (DA). The spectra were grouped into three classes, which were visually consistent with different shades of red. A two-dimensional PCA observations plot was constructed, describing 78.6% of the overall variance. The wavelength regions associated with the absorbance of hair and dye were highly correlated. Principal components were selected to represent 95% of the overall variance for analysis with DA. A classification accuracy of 89% was observed for the comprehensive dye set, while external validation using 20 of the dyes resulted in a prediction accuracy of 75%. Significant color loss from successive washing of hair samples was estimated to occur within 3 weeks of dye application.
Plant, Emma L; Smernik, Ronald J; van Leeuwen, John; Greenwood, Paul; Macdonald, Lynne M
2014-03-01
The paper-making process can produce large amounts of wastewater (WW) with high particulate and dissolved organic loads. Generally, in developed countries, stringent international regulations for environmental protection require pulp and paper mill WW to be treated to reduce the organic load prior to discharge into the receiving environment. This can be achieved by primary and secondary treatments involving both chemical and biological processes. These processes result in complex changes in the nature of the organic material, as some components are mineralised and others are transformed. In this study, changes in the nature of organics through different stages of secondary treatment of pulp and paper mill WW were followed using three advanced characterisation techniques: solid-state (13)C nuclear magnetic resonance (NMR) spectroscopy, pyrolysis-gas chromatography mass spectrometry (py-GCMS) and high-performance size-exclusion chromatography (HPSEC). Each technique provided a different perspective on the changes that occurred. To compare the different chemical perspectives in terms of the degree of similarity/difference between samples, we employed non-metric multidimensional scaling. Results indicate that NMR and HPSEC provided strongly correlated perspectives, with 86 % of the discrimination between the organic samples common to both techniques. Conversely, py-GCMS was found to provide a unique, and thus complementary, perspective.
Multivariate Relationships between Statistics Anxiety and Motivational Beliefs
Baloglu, Mustafa; Abbassi, Amir; Kesici, Sahin
2017-01-01
In general, anxiety has been found to be associated with motivational beliefs and the current study investigated multivariate relationships between statistics anxiety and motivational beliefs among 305 college students (60.0% women). The Statistical Anxiety Rating Scale, the Motivated Strategies for Learning Questionnaire, and a set of demographic…
Adjustment of geochemical background by robust multivariate statistics
Zhou, D.
1985-01-01
Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.
Arslan, Hakan; Ayyildiz Turan, Nazlı
2015-08-01
Monitoring of heavy metal concentrations in groundwater potentially used for drinking and irrigation is very important. This study collected groundwater samples from 78 wells in July 2012 and analyzed them for 17 heavy metals (Pb, Zn, Cr, Mn, Fe, Cu, Cd, Co, Ni, Al, As, Mo, Se, B, Ti, V, Ba). Spatial distributions of these elements were identified using three different interpolation methods [inverse distance weighing (IDW), radial basis function (RBF), and ordinary kriging (OK)]. Root mean squared error (RMSE) and mean absolute error (MAE) for cross validation were used to select the best interpolation methods for each parameter. Multivariate statistical analysis [cluster analysis (CA) and factor analysis (FA)] were used to identify similarities among sampling sites and the contribution of variables to groundwater pollution. Fe and Mn levels exceeded World Health Organization (WHO) recommended limits for drinking water in almost all of the study area, and some locations had Fe and Mn levels that exceeded Food and Agriculture Organization (FAO) guidelines for drip irrigation systems. Al, As, and Cd levels also exceeded WHO guidelines for drinking water. Cluster analysis classified groundwater in the study area into three groups, and factor analysis identified five factors that explained 73.39% of the total variation in groundwater, which are as follows: factor 1: Se, Ti, Cr, Mo; factor 2: Ni, Mn, Co, Ba; factor 3: Pb, Cd; factor 4: B, V, Fe, Cu; and factor 5: AS, Zn. As a result of this study, it could be said that interpolation methods and multivariate statistical techniques gave very useful results for the determination of the source.
Statistical Inference for a Class of Multivariate Negative Binomial Distributions
DEFF Research Database (Denmark)
Rubak, Ege H.; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...
An Implementation Technique for Multivariate Robust Design
Institute of Scientific and Technical Information of China (English)
MA Yi-zhong; ZHAO Feng-yu
2005-01-01
This paper investigates systematically the problem of multivariate robust parameter design. First, a measurement criterion for the total variation of multivariate quality characteristics is introduced by the result of information theory. Then the implementation procedure in the robust design is presented. After that, a simulation example from a practical industrial process is provided. Finally, some comments and further work are discussed.
Multicomponent seismic noise attenuation with multivariate order statistic filters
Wang, Chao; Wang, Yun; Wang, Xiaokai; Xun, Chao
2016-10-01
The vector relationship between multicomponent seismic data is highly important for multicomponent processing and interpretation, but this vector relationship could be damaged when each component is processed individually. To overcome the drawback of standard component-by-component filtering, multivariate order statistic filters are introduced and extended to attenuate the noise of multicomponent seismic data by treating such dataset as a vector wavefield rather than a set of scalar fields. According to the characteristics of seismic signals, we implement this type of multivariate filtering along local events. First, the optimal local events are recognized according to the similarity between the vector signals which are windowed from neighbouring seismic traces with a sliding time window along each trial trajectory. An efficient strategy is used to reduce the computational cost of similarity measurement for vector signals. Next, one vector sample each from the neighbouring traces are extracted along the optimal local event as the input data for a multivariate filter. Different multivariate filters are optimal for different noise. The multichannel modified trimmed mean (MTM) filter, as one of the multivariate order statistic filters, is applied to synthetic and field multicomponent seismic data to test its performance for attenuating white Gaussian noise. The results indicate that the multichannel MTM filter can attenuate noise while preserving the relative amplitude information of multicomponent seismic data more effectively than a single-channel filter.
Tools & techniques--statistics: propensity score techniques.
da Costa, Bruno R; Gahl, Brigitta; Jüni, Peter
2014-10-01
Propensity score (PS) techniques are useful if the number of potential confounding pretreatment variables is large and the number of analysed outcome events is rather small so that conventional multivariable adjustment is hardly feasible. Only pretreatment characteristics should be chosen to derive PS, and only when they are probably associated with outcome. A careful visual inspection of PS will help to identify areas of no or minimal overlap, which suggests residual confounding, and trimming of the data according to the distribution of PS will help to minimise residual confounding. Standardised differences in pretreatment characteristics provide a useful check of the success of the PS technique employed. As with conventional multivariable adjustment, PS techniques cannot account for confounding variables that are not or are only imperfectly measured, and no PS technique is a substitute for an adequately designed randomised trial.
Statistical analysis of multivariate atmospheric variables. [cloud cover
Tubbs, J. D.
1979-01-01
Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Multivariate statistics and the enactment of metabolic complexity.
Levin, Nadine
2014-08-01
This ethnographic study, based on fieldwork at the Computational and Systems Medicine laboratory at Imperial College London, shows how researchers in the field of metabolomics--the post-genomic study of the molecules and processes that make up metabolism--enact and coproduce complex views of biology with multivariate statistics. From this data-driven science, metabolism emerges as a multiple, informational and statistical object, which is both produced by and also necessitates particular forms of data production and analysis. Multivariate statistics emerge as 'natural' and 'correct' ways of engaging with a metabolism that is made up of many variables. In this sense, multivariate statistics allow researchers to engage with and conceptualize metabolism, and also disease and processes of life, as complex entities. Consequently, this article builds on studies of scientific practice and visualization to examine data as material objects rather than black-boxed representations. Data practices are not merely the technological components of experimentation, but are simultaneously technologies and methods and are intertwined with ways of seeing and enacting the biological world. Ultimately, this article questions the increasing invocation and role of complexity within biology, suggesting that discourses of complexity are often imbued with reductionist and determinist ways of thinking about biology, as scientists engage with complexity in calculated and controlled, but also limited, ways.
Multivariate statistical analysis a high-dimensional approach
Serdobolskii, V
2000-01-01
In the last few decades the accumulation of large amounts of in formation in numerous applications. has stimtllated an increased in terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen ...
Statistical inference for a class of multivariate negative binomial distributions
DEFF Research Database (Denmark)
Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on α-permanental randomfields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results for α...
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-05-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Classification of Malaysia aromatic rice using multivariate statistical analysis
Energy Technology Data Exchange (ETDEWEB)
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A. [School of Mechatronic Engineering, Universiti Malaysia Perlis, Kampus Pauh Putra, 02600 Arau, Perlis (Malaysia); Omar, O. [Malaysian Agriculture Research and Development Institute (MARDI), Persiaran MARDI-UPM, 43400 Serdang, Selangor (Malaysia)
2015-05-15
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Publishing nutrition research: a review of multivariate techniques--part 2: analysis of variance.
Harris, Jeffrey E; Sheean, Patricia M; Gleason, Philip M; Bruemmer, Barbara; Boushey, Carol
2012-01-01
This article is the eighth in a series exploring the importance of research design, statistical analysis, and epidemiology in nutrition and dietetics research, and the second in a series focused on multivariate statistical analytical techniques. The purpose of this review is to examine the statistical technique, analysis of variance (ANOVA), from its simplest to multivariate applications. Many dietetics practitioners are familiar with basic ANOVA, but less informed of the multivariate applications such as multiway ANOVA, repeated-measures ANOVA, analysis of covariance, multiple ANOVA, and multiple analysis of covariance. The article addresses all these applications and includes hypothetical and real examples from the field of dietetics.
Statistical Techniques for Project Control
Badiru, Adedeji B
2012-01-01
A project can be simple or complex. In each case, proven project management processes must be followed. In all cases of project management implementation, control must be exercised in order to assure that project objectives are achieved. Statistical Techniques for Project Control seamlessly integrates qualitative and quantitative tools and techniques for project control. It fills the void that exists in the application of statistical techniques to project control. The book begins by defining the fundamentals of project management then explores how to temper quantitative analysis with qualitati
Multivariate Statistical Modelling of Drought and Heat Wave Events
Manning, Colin; Widmann, Martin; Vrac, Mathieu; Maraun, Douglas; Bevaqua, Emanuele
2016-04-01
Multivariate Statistical Modelling of Drought and Heat Wave Events C. Manning1,2, M. Widmann1, M. Vrac2, D. Maraun3, E. Bevaqua2,3 1. School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK 2. Laboratoire des Sciences du Climat et de l'Environnement, (LSCE-IPSL), Centre d'Etudes de Saclay, Gif-sur-Yvette, France 3. Wegener Center for Climate and Global Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria Compound extreme events are a combination of two or more contributing events which in themselves may not be extreme but through their joint occurrence produce an extreme impact. Compound events are noted in the latest IPCC report as an important type of extreme event that have been given little attention so far. As part of the CE:LLO project (Compound Events: muLtivariate statisticaL mOdelling) we are developing a multivariate statistical model to gain an understanding of the dependence structure of certain compound events. One focus of this project is on the interaction between drought and heat wave events. Soil moisture has both a local and non-local effect on the occurrence of heat waves where it strongly controls the latent heat flux affecting the transfer of sensible heat to the atmosphere. These processes can create a feedback whereby a heat wave maybe amplified or suppressed by the soil moisture preconditioning, and vice versa, the heat wave may in turn have an effect on soil conditions. An aim of this project is to capture this dependence in order to correctly describe the joint probabilities of these conditions and the resulting probability of their compound impact. We will show an application of Pair Copula Constructions (PCCs) to study the aforementioned compound event. PCCs allow in theory for the formulation of multivariate dependence structures in any dimension where the PCC is a decomposition of a multivariate distribution into a product of bivariate components modelled using copulas. A
Classification of Specialized Farms Applying Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
Zuzana Hloušková
2017-01-01
Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.
Classification Techniques for Multivariate Data Analysis.
1980-03-28
analysis among biologists, botanists, and ecologists, while some social scientists may refer "typology". Other frequently encountered terms are pattern...the determinantal equation: lB -XW 0 (42) 49 The solutions X. are the eigenvalues of the matrix W-1 B 1 as in discriminant analysis. There are t non...Statistical Package for Social Sciences (SPSS) (14) subprogram FACTOR was used for the principal components analysis. It is designed both for the factor
2007-06-01
the observed system. Our research involved a comparative analysis of two multivariate statistical methods, the multivariate CUSUM (MCUSUM) and the...outbreaks. We found that, similar to results for the univariate CUSUM and EWMA, the directionally-sensitive MCUSUM and MEWMA perform very similarly. 14...SUBJECT TERMS Biosurveillance, Multivariate CUSUM , Multivariate EWMA, Statistical Process Control, Syndromic Surveillance 15. NUMBER OF PAGES
Kotula, Paul G; Keenan, Michael R
2006-12-01
Multivariate statistical analysis methods have been applied to scanning transmission electron microscopy (STEM) energy-dispersive X-ray spectral images. The particular application of the multivariate curve resolution (MCR) technique provides a high spectral contrast view of the raw spectral image. The power of this approach is demonstrated with a microelectronics failure analysis. Specifically, an unexpected component describing a chemical contaminant was found, as well as a component consistent with a foil thickness change associated with the focused ion beam specimen preparation process. The MCR solution is compared with a conventional analysis of the same spectral image data set.
Gap Shape Classification using Landscape Indices and Multivariate Statistics
Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung
2016-11-01
This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.
Multivariate statistical modelling based on generalized linear models
Fahrmeir, Ludwig
1994-01-01
This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Multivariate techniques of analysis for ToF-E recoil spectrometry data
Energy Technology Data Exchange (ETDEWEB)
Whitlow, H.J.; Bouanani, M.E.; Persson, L.; Hult, M.; Jonsson, P.; Johnston, P.N. [Lund Institute of Technology, Solvegatan, (Sweden), Department of Nuclear Physics; Andersson, M. [Uppsala Univ. (Sweden). Dept. of Organic Chemistry; Ostling, M.; Zaring, C. [Royal institute of Technology, Electrum, Kista, (Sweden), Department of Electronics; Johnston, P.N.; Bubb, I.F.; Walker, B.R.; Stannard, W.B. [Royal Melbourne Inst. of Tech., VIC (Australia); Cohen, D.D.; Dytlewski, N. [Australian Nuclear Science and Technology Organisation, Lucas Heights, NSW (Australia)
1996-12-31
Multivariate statistical methods are being developed by the Australian -Swedish Recoil Spectrometry Collaboration for quantitative analysis of the wealth of information in Time of Flight (ToF) and energy dispersive Recoil Spectrometry. An overview is presented of progress made in the use of multivariate techniques for energy calibration, separation of mass-overlapped signals and simulation of ToF-E data. 6 refs., 5 figs.
21 CFR 820.250 - Statistical techniques.
2010-04-01
... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Statistical techniques. 820.250 Section 820.250...) MEDICAL DEVICES QUALITY SYSTEM REGULATION Statistical Techniques § 820.250 Statistical techniques. (a... statistical techniques required for establishing, controlling, and verifying the acceptability of process...
Characterization of Lavandula spp. Honey Using Multivariate Techniques
2016-01-01
Traditionally, melissopalynological and physicochemical analyses have been the most used to determine the botanical origin of honey. However, when performed individually, these analyses may provide less unambiguous results, making it difficult to discriminate between mono and multifloral honeys. In this context, with the aim of better characterizing this beehive product, a selection of 112 Lavandula spp. monofloral honey samples from several regions were evaluated by association of multivariate statistical techniques with physicochemical, melissopalynological and phenolic compounds analysis. All honey samples fulfilled the quality standards recommended by international legislation, except regarding sucrose content and diastase activity. The content of sucrose and the percentage of Lavandula spp. pollen have a strong positive association. In fact, it was found that higher amounts of sucrose in honey are related with highest percentage of pollen of Lavandula spp.. The samples were very similar for most of the physicochemical parameters, except for proline, flavonoids and phenols (bioactive factors). Concerning the pollen spectrum, the variation of Lavandula spp. pollen percentage in honey had little contribution to the formation of samples groups. The formation of two groups regarding the physicochemical parameters suggests that the presence of other pollen types in small percentages influences the factor termed as “bioactive”, which has been linked to diverse beneficial health effects. PMID:27588420
Institute of Scientific and Technical Information of China (English)
Amin Manouchehrian; Mostafa Sharifzadeh; Rasoul Hamidzadeh Moghadam
2012-01-01
Before any rock engineering project,mechanical parameters of rocks such as uniaxial compressive strength and young modulus of intact rock get measured using laboratory or in-situ tests,but in some situations preparing the required specimens is impossible.By this time,several models have been established to evaluate UCS and E from rock substantial properties.Artificial neural networks are powerful tools which are employed to establish predictive models and results have shown the priority of this technique compared to classic statistical techniques.In this paper,ANN and multivariate statistical models considering rock textural characteristics have been established to estimate UCS of rock and to validate the responses of the established models,they were compared with laboratory results.For this purpose a data set for 44 samples of sandstone was prepared and for each sample some textural characteristics such as void,mineral content and grain size as well as UCS were determined.To select the best predictors as inputs of the UCS models,this data set was subjected to statistical analyses comprising basic descriptive statistics,bivariate correlation,curve fitting and principal component analyses.Results of such analyses have shown that void,ferroan calcitic cement,argillaceous cement and mica percentage have the most effect on USC Two predictive models for UCS were developed using these variables by ANN and linear multivariate regression.Results have shown that by using simple textural characteristics such as mineral content,cement type and void,strength of studied sandstone can be estimated with acceptable accuracy.ANN and multivariate statistical UCS models,revealed responses with 0.87 and 0.76 regressions,respectively which proves higher potential of ANN model for predicting UCS compared to classic statistical models.
Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.
2014-12-01
Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.
Energy Technology Data Exchange (ETDEWEB)
Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com
2009-11-15
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
Review of robust multivariate statistical methods in high dimension.
Filzmoser, Peter; Todorov, Valentin
2011-10-31
General ideas of robust statistics, and specifically robust statistical methods for calibration and dimension reduction are discussed. The emphasis is on analyzing high-dimensional data. The discussed methods are applied using the packages chemometrics and rrcov of the statistical software environment R. It is demonstrated how the functions can be applied to real high-dimensional data from chemometrics, and how the results can be interpreted.
Beguería, S.; Lorente, A.
2007-01-01
This paper, written as a deliverable of the DAMOCLES project, is a review of the different existing methodologies to landslide hazard mapping by multivariate statistics. Within the DAMOCLES project, multivariate statistical models have been applied to different study regions in Italy and Spain. The
Badran, M; Morsy, R; Soliman, H; Elnimr, T
2016-01-01
The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.
Wang, Yi; Ma, Xiang; Wen, Ya-Dong; Zou, Quan; Wang, Jun; Tu, Jia-Run; Cai, Wen-Sheng; Shao, Xue-Guang
2013-05-01
Near infrared diffusive reflectance spectroscopy has been applied in on-site or on-line analysis due to its characteristics of fastness, non-destruction and the feasibility for real complex sample analysis. The present work reported a real-time monitoring method for industrial production by using near infrared spectroscopic technique and multivariate statistical process analysis. In the method, the real-time near infrared spectra of the materials are collected on the production line, and then the evaluation of the production process can be achieved by a statistic Hotelling T2 calculated with the established model. In this work, principal component analysis (PCA) is adopted for building the model, and the statistic is calculated by projecting the real-time spectra onto the PCA model. With an application of the method in a practical production, it was demonstrated that a real-time evaluation of the variations in the production can be realized by investigating the changes in the statistic, and the comparison of the products in different batches can be achieved by further statistics of the statistic. Therefore, the proposed method may provide a practical way for quality insurance of production processes.
Ordinary chondrites - Multivariate statistical analysis of trace element contents
Lipschutz, Michael E.; Samuels, Stephen M.
1991-01-01
The contents of mobile trace elements (Co, Au, Sb, Ga, Se, Rb, Cs, Te, Bi, Ag, In, Tl, Zn, and Cd) in Antarctic and non-Antarctic populations of H4-6 and L4-6 chondrites, were compared using standard multivariate discriminant functions borrowed from linear discriminant analysis and logistic regression. A nonstandard randomization-simulation method was developed, making it possible to carry out probability assignments on a distribution-free basis. Compositional differences were found both between the Antarctic and non-Antarctic H4-6 chondrite populations and between two L4-6 chondrite populations. It is shown that, for various types of meteorites (in particular, for the H4-6 chondrites), the Antarctic/non-Antarctic compositional difference is due to preterrestrial differences in the genesis of their parent materials.
Multivariate discrimination technique based on the Bayesian theory
Institute of Scientific and Technical Information of China (English)
JIN Ping; PAN Chang-zhou; XIAO Wei-guo
2007-01-01
A multivariate discrimination technique was established based on the Bayesian theory. Using this technique, P/S ratios of different types (e.g., Pn/Sn, Pn/Lg, Pg/Sn or Pg/Lg) measured within different frequency bands and from different stations were combined together to discriminate seismic events in Central Asia. Major advantages of the Bayesian approach are that the probability to be an explosion for any unknown event can be directly calculated given the measurements of a group of discriminants, and at the same time correlations among these discriminants can be fully taken into account. It was proved theoretically that the Bayesian technique would be optimal and its discriminating performance would be better than that of any individual discriminant as well as better than that yielded by the linear combination approach ignoring correlations among discriminants. This conclusion was also validated in this paper by applying the Bayesian approach to the above-mentioned observed data.
Energy Technology Data Exchange (ETDEWEB)
Berman, E F; Kulp, K S; Knize, M G; Wu, L; Nelson, E J; Nelson, D O; Wu, K J
2006-05-04
Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) is utilized to examine the mass spectra and fragmentation patterns of seven isomeric monosaccharides. Multivariate statistical analysis techniques, including principal component analysis (PCA), allow discrimination of the extremely similar mass spectra of stereoisomers. Furthermore, PCA identifies those fragment peaks which vary significantly between spectra. Heavy isotope studies confirm that these peaks are indeed sugar fragments, allow identification of the fragments, and provide clues to the fragmentation pathways. Excellent reproducibility is shown by multiple experiments performed over time and on separate samples. This study demonstrates the combined selectivity and discrimination power of ToF-SIMS and PCA, and suggests new applications of the technique including differentiation of subtle chemical changes in biological samples that may provide insights into cellular processes, disease progress, and disease diagnosis.
Singh, Elangbam J K; Gupta, Abhik; Singh, N R
2013-04-01
The aim of this paper was to analyze the groundwater quality of Imphal West district, Manipur, India, and assess its suitability for drinking, domestic, and agricultural use. Eighteen physico-chemical variables were analyzed in groundwater from 30 different hand-operated tube wells in urban, suburban, and rural areas in two seasons. The data were subjected to uni-, bi-, and multivariate statistical analysis, the latter comprising cluster analysis (CA), principal component analysis (PCA), and factor analysis (FA). Arsenic concentrations exceed the Indian standard in 23.3% and the WHO limit in 73.3% of the groundwater sources with only 26.7% in the acceptable range. Several variables like iron, chloride, sodium, sulfate, total dissolved solids, and turbidity are also beyond their desirable limits for drinking water in a number of sites. Sodium concentrations and sodium absorption ratio (SAR) are both high to render the water from the majority of the sources unsuitable for agricultural use. Multivariate statistical techniques, especially varimax rotation of PCA data helped to bring to focus the hidden yet important variables and understand their roles in influencing groundwater quality. Widespread arsenic contamination and high sodium concentration of groundwater pose formidable constraints towards its exploitation for drinking and other domestic and agricultural use in the study area, although urban anthropogenic impacts are not yet pronounced.
Multivariate statistical analysis of surface water chemistry: A case study of Gharasoo River, Iran
Directory of Open Access Journals (Sweden)
MH Sayadi
2014-09-01
Full Text Available Regional water quality is a hot spot in the environmental sciences for inconsistency of pollutants. In this paper, the surface water quality of the Gharasoo River in western Iran is assessed incorporating multivariate statistical techniques. Parameters like EC, TDS, pH, HCO3-, Cl-, SO4 2-, Ca2+, Mg2+ and Na+ were analyzed. Principal component and factor analysis is showed the parameters generated 3 significant factors, which explained 73.06% of the variance in data sets. Factor 1 may be derived from agricultural activities and subsequent release of EC, TDS, SO4 2- and Na+ to the water. Factor 2 could be influenced by domestic pollution and explained the deliverance of HCO3-, Cl- and Mg2+ into the water. Factor 3 contains hydro-geochemical variable Ca2+ and pH, originating from mineralization of the geological components of bed sediments and soils of watershed area. Likewise, the clustering analysis generated 3 groups of the stations as the groups had similar characteristic features. Pearson correlation analysis showed significant correlations between HCO3- and Mg2+ (0.775, Ca2+ (0.552 as well as TDS and Na+ (0.726. With reference to multivariate statistical analyses it can be concluded that the agricultural, domestic and hydro-geochemical sources are releasing the pollutants into the Gharasoo River water.
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.
2012-01-01
PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator
Multivariate Statistical Process Optimization in the Industrial Production of Enzymes
DEFF Research Database (Denmark)
Klimkiewicz, Anna
In modern biotech production, a massive number of diverse measurements, with a broad diversity in information content and quality, is stored in data historians. The potential of this enormous amount of data is currently under-employed in process optimization efforts. This is a result of the deman......In modern biotech production, a massive number of diverse measurements, with a broad diversity in information content and quality, is stored in data historians. The potential of this enormous amount of data is currently under-employed in process optimization efforts. This is a result...... and difficulties related to ‘recycling’ of historical data from a full-scale manufacturing of industrial enzymes. First, the crucial and tedious step of retrieving the data from the systems is presented. The prerequisites that need to be comprehended are discussed, such as sensors accuracy and reliability, aspects...... related to the actual measuring frequency and non-equidistance retaining strategies in data storage. Different regimes of data extraction can be employed, and some might introduce undesirable artifacts in the final analysis results (POSTER II1). Several signal processing techniques are also briefly...
Multivariate statistical analysis for the surface water quality of the Luan River, China
Institute of Scientific and Technical Information of China (English)
Zhi-wei ZHAO; Fu-yi CUI
2009-01-01
In order to analyze the characteristics of surface water resource quality for the reconstruction of old water treatment plant, multivariate statistical techniques such as cluster analysis and factor analysis were applied to the data of Yuqiao Reservoir--surface water resource of the Luan River, China. The results of cluster analysis demonstrate that the months of one year were divided into 3 groups and the characteristic of clusters was agreed with the seasonal characteristics in North China. Three factors were derived from the complicated set using factor analysis. Factor 1 included turbidity and chlorophyll, which seemed to be related to the anthropogenic activities; factor 2 included alkaline and hardness, which were related to the natural characteristic of surface water; and factor 3 included Cl and NO-N affected by mineral and agricultural activities. The sinusoidal shape of the score plots of the three factors shows that the temporal variations caused by natural and human factors are linked to seasouality.
Liu, Na; Li, Jun; Li, Bao-Guo
2014-11-01
The study of quality control of Chinese medicine has always been the hot and the difficulty spot of the development of traditional Chinese medicine (TCM), which is also one of the key problems restricting the modernization and internationalization of Chinese medicine. Multivariate statistical analysis is an analytical method which is suitable for the analysis of characteristics of TCM. It has been used widely in the study of quality control of TCM. Multivariate Statistical analysis was used for multivariate indicators and variables that appeared in the study of quality control and had certain correlation between each other, to find out the hidden law or the relationship between the data can be found,.which could apply to serve the decision-making and realize the effective quality evaluation of TCM. In this paper, the application of multivariate statistical analysis in the quality control of Chinese medicine was summarized, which could provided the basis for its further study.
Energy Technology Data Exchange (ETDEWEB)
Almazan T, M. G.; Jimenez R, M.; Monroy G, F.; Tenorio, D. [ININ, Carretera Mexico-Toluca s/n, 52750 Ocoyoacac, Estado de Mexico (Mexico); Rodriguez G, N. L. [Instituto Mexiquense de Cultura, Subdireccion de Restauracion y Conservacion, Hidalgo poniente No. 1013, 50080 Toluca, Estado de Mexico (Mexico)
2009-07-01
The elementary composition of archaeological ceramic fragments obtained during the explorations in San Miguel Ixtapan, Mexico State, was determined by the neutron activation analysis technique. The samples irradiation was realized in the research reactor TRIGA Mark III with a neutrons flow of 1centre dot10{sup 13}ncentre dotcm{sup -2}centre dots{sup -1}. The irradiation time was of 2 hours. Previous to the acquisition of the gamma rays spectrum the samples were allowed to decay from 12 to 14 days. The analyzed elements were: Nd, Ce, Lu, Eu, Yb, Pa(Th), Tb, La, Cr, Hf, Sc, Co, Fe, Cs, Rb. The statistical treatment of the data, consistent in the group analysis and the main components analysis allowed to identify three different origins of the archaeological ceramic, designated as: local, foreign and regional. (Author)
Ben Alaya, M. A.; Chebana, F.; Ouarda, T. B. M. J.
2016-09-01
Statistical downscaling techniques are required to refine atmosphere-ocean global climate data and provide reliable meteorological information such as a realistic temporal variability and relationships between sites and variables in a changing climate. To this end, the present paper introduces a modular structure combining two statistical tools of increasing interest during the last years: (1) Gaussian copula and (2) quantile regression. The quantile regression tool is employed to specify the entire conditional distribution of downscaled variables and to address the limitations of traditional regression-based approaches whereas the Gaussian copula is performed to describe and preserve the dependence between both variables and sites. A case study based on precipitation and maximum and minimum temperatures from the province of Quebec, Canada, is used to evaluate the performance of the proposed model. Obtained results suggest that this approach is capable of generating series with realistic correlation structures and temporal variability. Furthermore, the proposed model performed better than a classical multisite multivariate statistical downscaling model for most evaluation criteria.
Institute of Scientific and Technical Information of China (English)
梁军; 钱积新
2003-01-01
Multivariate statistical process monitoring and control (MSPM& C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares (PLS) are surveyed in this paper,The four-step procedure of performing MSPM &C for chemical process ,modeling of processes ,detecting abnormal events or faults,identifying the variable(s) responible for the faults and diagnosing the source cause for the abnormal behavior,is analyzed,Several main research directions of MSPM&C reported in the literature are discussed,such as multi-way principal component analysis (MPCA) for batch process ,statistical monitoring and control for nonlinear process,dynamic PCA and dynamic PLS,and on -line quality control by infer-ential models,Industrial applications of MSPM&C to several typical chemical processes ,such as chemical reactor,distillation column,polymeriztion process ,petroleum refinery units,are summarized,Finally,some concluding remarks and future considerations are made.
Multivariate statistical analysis of radioactive variables in two phosphate ores from Sudan.
Adam, Abdel Majid A; Eltayeb, Mohamed Ahmed H
2012-05-01
Multivariate statistical techniques are efficient ways to display complex relationships among many objects. An attempt was made to study the radioactive data in two types of Sudanese phosphate deposits; Kurun and Uro phosphate, using several multivariate statistical methods. Pearson correlation coefficient revealed that a U-238 distribution in Kurun phosphate is controlled by the variation of K-40 concentration, whereas in Uro phosphate it is controlled by the variation of U-235 and U-234 concentration. Histograms and normal Q-Q plots clearly show that the radioactive variables did not follow a normal distribution. This non-normality feature observed may be attributed to complicating influence of geological factors. The principal components analysis (PCA) gives a model of five components for representing the acquired data from Kurun phosphate, where 89.5% of the total variance is explained. A model of four components was sufficient to represent the acquired data from Uro phosphate, where 87.5% of the total data variance is explained. The hierarchical cluster analysis (HCA) indicates that U-238 behaves in the same manner in the two types of phosphates; it associated with a group of four radionuclides; U-234, Po-210, Ra-226, Th-230, which the most abundant radionuclides, and all belong to the uranium-238 decay series. Two parameters have been adapted for the direct differentiate between the two phosphates. Firstly, U-238 in Uro phosphate have shown higher degree of mobility (CV% = 82.6) than that in Kurun phosphate (CV% = 64.7), and secondly, the activity ratio of Th-230/Th-232 in Uro phosphate is nine times than that in Kurun phosphate.
Multivariate Analysis Techniques for Optimal Vision System Design
DEFF Research Database (Denmark)
Sharifzadeh, Sara
used in this thesis are described. The methodological strategies are outlined including sparse regression and pre-processing based on feature selection and extraction methods, supervised versus unsupervised analysis and linear versus non-linear approaches. One supervised feature selection algorithm......The present thesis considers optimization of the spectral vision systems used for quality inspection of food items. The relationship between food quality, vision based techniques and spectral signature are described. The vision instruments for food analysis as well as datasets of the food items...... (SSPCA) and DCT based characterization of the spectral diffused reflectance images for wavelength selection and discrimination. These methods together with some other state-of-the-art statistical and mathematical analysis techniques are applied on datasets of different food items; meat, diaries, fruits...
González Gallero, Francisco Javier; Galán Vallejo, Manuel; Umbría, Arturo; Gervilla Baena, Juan
2006-08-01
A complete statistical analysis of meteorological and air pollution data was carried out in the 'Campo de Gibraltar' region (in the South of Spain) from 1999 to 2002. This is a heavy industrialized area where, up to date, very few air pollution studies have been made. The main objectives of the study presented here have been the characterization of the meteorological and (gaseous and particulate) air pollution conditions in the area, and the relations between them. Multivariate statistical techniques, such as Principal Component Analysis (PCA), have been applied to the data. The results show that air quality in the area is highly dependent on meteorological conditions such as wind persistence and direction, dispersion capability of the atmosphere, and humidity content. On average, sulphur dioxide and nitrogen oxide air pollution, mainly caused by fuel-oil combustion and traffic, respectively, is not very high. However, an important number of exceedences of the limits established by the EU Directive 1999 for PM10 (particulate matter with a diameter less than 10 microm) have been observed in some points of the area. A significant percentage of these exceedences (about 22% on average) are likely caused by African dust intrusions, which usually take place from May to August. From gaseous and particulate air correlations, it seems that anthropogenic activities contribute with about 19% on average.
Statistical and Computational Techniques in Manufacturing
2012-01-01
In recent years, interest in developing statistical and computational techniques for applied manufacturing engineering has been increased. Today, due to the great complexity of manufacturing engineering and the high number of parameters used, conventional approaches are no longer sufficient. Therefore, in manufacturing, statistical and computational techniques have achieved several applications, namely, modelling and simulation manufacturing processes, optimization manufacturing parameters, monitoring and control, computer-aided process planning, etc. The present book aims to provide recent information on statistical and computational techniques applied in manufacturing engineering. The content is suitable for final undergraduate engineering courses or as a subject on manufacturing at the postgraduate level. This book serves as a useful reference for academics, statistical and computational science researchers, mechanical, manufacturing and industrial engineers, and professionals in industries related to manu...
Stalked protozoa identification by image analysis and multivariable statistical techniques
Amaral, A.L.; Ginoris, Y. P.; Nicolau, Ana; M.A.Z. Coelho; Ferreira, E. C.
2008-01-01
Protozoa are considered good indicators of the treatment quality in activated sludge systems as they are sensitive to physical, chemical and operational processes. Therefore, it is possible to correlate the predominance of certain species or groups and several operational parameters of the plant. This work presents a semiautomatic image analysis procedure for the recognition of the stalked protozoa species most frequently found in wastewater treatment plants by determinin...
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-09-19
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km(2), with a median of 0.4 samples per km(2). The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis
Directory of Open Access Journals (Sweden)
Arayne M
2009-01-01
Full Text Available A sensitive and accurate UV spectrophotometric method with multivariate calibration technique for the determination of metformin hydrochloride in bulk drug and different pharmaceutical formulations has been described. This technique is based on the use of the linear regression equations by using relationship between concentration and absorbance at five different wavelength. The results were treated statistically and were found highly accurate, precise and reproducible. The method is accurate, precise (% recovery 102.500.063, CV≤0.56, r =0.997 and linear within the range 1-10 mg/ml. There was no interference from the excipients i.e Povidone K 30, magnesium stearate, lactose and hydroxypropylmethylcellulose. This statistical approach gives optimum results for the eliminating fluctuations coming from instrumental or experimental conditions.
Reuter, M; Netter, P
2001-01-01
The present study proposes a hierarchical multivariate statistical prediction model which enables to determine the most prominent variables (physiological, biochemical and personality factors) related to nicotine craving and dopaminergic activation. Based on animal studies reporting a reduction of the rewarding effects of psychotropic drugs after blockade or destruction of the mesolimbic dopamine (DA) system, changes in nicotine craving after pharmacological manipulation by means of a DA agonist (lisuride 0.2 mg) and a DA antagonist (fluphenazine 2 mg) were assessed in 36 healthy male heavy smokers. The major aim was the development of a multivariate prediction model which is applicable in samples lacking variance homogeneity or the prerequisite of a multivariate normal distribution. The model proposed is a combination of multivariate parametric and nonparametric methods taking advantage of their individual merits. Especially personality variables, such as sensation seeking, impulsivity, and neuroticism showed to be important predictors of craving in this responder approach.
Blake, Sarah; Henry, Tiernan; Murray, John; Flood, Rory; Muller, Mark R.; Jones, Alan G.; Rath, Volker
2016-04-01
The geothermal energy of thermal groundwater is currently being exploited for district-scale heating in many locations world-wide. The chemical compositions of these thermal waters reflect the provenance and hydrothermal circulation patterns of the groundwater, which are controlled by recharge, rock type and geological structure. Exploring the provenance of these waters using multivariate statistical analysis (MSA) techniques increases our understanding of the hydrothermal circulation systems, and provides a reliable tool for assessing these resources. Hydrochemical data from thermal springs situated in the Carboniferous Dublin Basin in east-central Ireland were explored using MSA, including hierarchical cluster analysis (HCA) and principal component analysis (PCA), to investigate the source aquifers of the thermal groundwaters. To take into account the compositional nature of the hydrochemical data, compositional data analysis (CoDa) techniques were used to process the data prior to the MSA. The results of the MSA were examined alongside detailed time-lapse temperature measurements from several of the springs, and indicate the influence of three important hydrogeological processes on the hydrochemistry of the thermal waters: 1) increased salinity due to evaporite dissolution and increased water-rock-interaction; 2) dissolution of carbonates; and 3) dissolution of metal sulfides and oxides associated with mineral deposits. The use of MSA within the CoDa framework identified subtle temporal variations in the hydrochemistry of the thermal springs, which could not be identified with more traditional graphing methods (e.g., Piper diagrams), or with a standard statistical approach. The MSA was successful in distinguishing different geological settings and different annual behaviours within the group of springs. This study demonstrates the usefulness of the application of MSA within the CoDa framework in order to better understand the underlying controlling processes
Cherukupalle, Nirmala devi
This bibliography contains works that illustrate and apply multivariate statistical methods in the analysis of empirical data in the field of urban and regional planning. The bibliography has been designed for use by planning students and the professional planner. The first section of the bibliography lists some elementary and intermediate level…
Directory of Open Access Journals (Sweden)
Lyubov V. Ruchinskaya
2013-01-01
Full Text Available Methodological and methodical basis of the developed methods of the multivariate statistical analysis of consumers’ preferences at the Russian market of cultured milk products is considered. The author carried out segmentation of consumers of the cultured milk production based on methods of multidimensional classification and allowing optimizing structure of production of milk production by domestic producers.
von Larcher, Thomas; Harlander, Uwe; Alexandrov, Kiril; Wang, Yongtai
2010-05-01
Experiments on baroclinic wave instabilities in a rotating cylindrical gap have been long performed, e.g., to unhide regular waves of different zonal wave number, to better understand the transition to the quasi-chaotic regime, and to reveal the underlying dynamical processes of complex wave flows. We present the application of appropriate multivariate data analysis methods on time series data sets acquired by the use of non-intrusive measurement techniques of a quite different nature. While the high accurate Laser-Doppler-Velocimetry (LDV ) is used for measurements of the radial velocity component at equidistant azimuthal positions, a high sensitive thermographic camera measures the surface temperature field. The measurements are performed at particular parameter points, where our former studies show that kinds of complex wave patterns occur [1, 2]. Obviously, the temperature data set has much more information content as the velocity data set due to the particular measurement techniques. Both sets of time series data are analyzed by using multivariate statistical techniques. While the LDV data sets are studied by applying the Multi-Channel Singular Spectrum Analysis (M - SSA), the temperature data sets are analyzed by applying the Empirical Orthogonal Functions (EOF ). Our goal is (a) to verify the results yielded with the analysis of the velocity data and (b) to compare the data analysis methods. Therefor, the temperature data are processed in a way to become comparable to the LDV data, i.e. reducing the size of the data set in such a manner that the temperature measurements would imaginary be performed at equidistant azimuthal positions only. This approach initially results in a great loss of information. But applying the M - SSA to the reduced temperature data sets enable us to compare the methods. [1] Th. von Larcher and C. Egbers, Experiments on transitions of baroclinic waves in a differentially heated rotating annulus, Nonlinear Processes in Geophysics
Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice
2017-06-01
In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.
Multivariate statistical approach for the assessment of groundwater quality in Ujjain City, India.
Vishwakarma, Vikas; Thakur, Lokendra Singh
2012-10-01
Groundwater quality assessment is an essential study which plays important role in the rational development and utilization of groundwater. Groundwater quality greatly influences the health of local people. The variations of water quality are essentially the combination of both anthropogenic and natural contributions. In order to understand the underlying physical and chemical processes this study analyzes 8 chemical and physical-chemical water quality parameters, viz. pH, turbidity, electrical conductivity, total dissolved solids, total alkalinity, total hardness, chloride and fluoride recorded at the 54 sampling stations during summer season of 2011 by using multivariate statistical techniques. Hierarchical clustering analysis (CA) is first applied to distinguish groundwater quality patterns among the stations, followed by the use of principle component analysis (PCA) and factor analysis (FA) to extract and recognize the major underlying factors contributing to the variations among the water quality measures. The first three components were chosen for interpretation of the data, which accounts for 72.502% of the total variance in the data set. The maximum number of variables, i.e. turbidity, EC, TDS and chloride were characterized by first component, while second and third were characterized by total alkalinity, total hardness, fluoride and pH respectively. This shows that hydro chemical constituents of the groundwater are mainly controlled by EC, TDS, and fluoride. The findings of the cluster analysis are presented in the form of dendrogram of the sampling stations (cases) as well as hydro chemical variables, which produced four major groupings, suggest that groundwater monitoring can be consolidated.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2016-11-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Directory of Open Access Journals (Sweden)
Jiwen Ge
2013-07-01
, hydropower exploitation and municipal waste. The study demonstrates the utility of multivariate statistical techniques for river water quality assessment, identification of pollution sources, and exploring spatial and temporal variations of water quality.
Directory of Open Access Journals (Sweden)
Zamani Abbas Ali
2012-12-01
Full Text Available Abstract The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP. Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs. Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein
2012-12-17
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Multivariate moment closure techniques for stochastic kinetic models
Lakatos, Eszter; Ale, Angelique; Kirk, Paul D. W.; Stumpf, Michael P. H.
2015-09-01
Stochastic effects dominate many chemical and biochemical processes. Their analysis, however, can be computationally prohibitively expensive and a range of approximation schemes have been proposed to lighten the computational burden. These, notably the increasingly popular linear noise approximation and the more general moment expansion methods, perform well for many dynamical regimes, especially linear systems. At higher levels of nonlinearity, it comes to an interplay between the nonlinearities and the stochastic dynamics, which is much harder to capture correctly by such approximations to the true stochastic processes. Moment-closure approaches promise to address this problem by capturing higher-order terms of the temporally evolving probability distribution. Here, we develop a set of multivariate moment-closures that allows us to describe the stochastic dynamics of nonlinear systems. Multivariate closure captures the way that correlations between different molecular species, induced by the reaction dynamics, interact with stochastic effects. We use multivariate Gaussian, gamma, and lognormal closure and illustrate their use in the context of two models that have proved challenging to the previous attempts at approximating stochastic dynamics: oscillations in p53 and Hes1. In addition, we consider a larger system, Erk-mediated mitogen-activated protein kinases signalling, where conventional stochastic simulation approaches incur unacceptably high computational costs.
Multivariate moment closure techniques for stochastic kinetic models
Energy Technology Data Exchange (ETDEWEB)
Lakatos, Eszter, E-mail: e.lakatos13@imperial.ac.uk; Ale, Angelique; Kirk, Paul D. W.; Stumpf, Michael P. H., E-mail: m.stumpf@imperial.ac.uk [Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ (United Kingdom)
2015-09-07
Stochastic effects dominate many chemical and biochemical processes. Their analysis, however, can be computationally prohibitively expensive and a range of approximation schemes have been proposed to lighten the computational burden. These, notably the increasingly popular linear noise approximation and the more general moment expansion methods, perform well for many dynamical regimes, especially linear systems. At higher levels of nonlinearity, it comes to an interplay between the nonlinearities and the stochastic dynamics, which is much harder to capture correctly by such approximations to the true stochastic processes. Moment-closure approaches promise to address this problem by capturing higher-order terms of the temporally evolving probability distribution. Here, we develop a set of multivariate moment-closures that allows us to describe the stochastic dynamics of nonlinear systems. Multivariate closure captures the way that correlations between different molecular species, induced by the reaction dynamics, interact with stochastic effects. We use multivariate Gaussian, gamma, and lognormal closure and illustrate their use in the context of two models that have proved challenging to the previous attempts at approximating stochastic dynamics: oscillations in p53 and Hes1. In addition, we consider a larger system, Erk-mediated mitogen-activated protein kinases signalling, where conventional stochastic simulation approaches incur unacceptably high computational costs.
Multivariate moment closure techniques for stochastic kinetic models.
Lakatos, Eszter; Ale, Angelique; Kirk, Paul D W; Stumpf, Michael P H
2015-09-07
Stochastic effects dominate many chemical and biochemical processes. Their analysis, however, can be computationally prohibitively expensive and a range of approximation schemes have been proposed to lighten the computational burden. These, notably the increasingly popular linear noise approximation and the more general moment expansion methods, perform well for many dynamical regimes, especially linear systems. At higher levels of nonlinearity, it comes to an interplay between the nonlinearities and the stochastic dynamics, which is much harder to capture correctly by such approximations to the true stochastic processes. Moment-closure approaches promise to address this problem by capturing higher-order terms of the temporally evolving probability distribution. Here, we develop a set of multivariate moment-closures that allows us to describe the stochastic dynamics of nonlinear systems. Multivariate closure captures the way that correlations between different molecular species, induced by the reaction dynamics, interact with stochastic effects. We use multivariate Gaussian, gamma, and lognormal closure and illustrate their use in the context of two models that have proved challenging to the previous attempts at approximating stochastic dynamics: oscillations in p53 and Hes1. In addition, we consider a larger system, Erk-mediated mitogen-activated protein kinases signalling, where conventional stochastic simulation approaches incur unacceptably high computational costs.
Joint multivariate statistical model and its applications to the synthetic earthquake prediction
Institute of Scientific and Technical Information of China (English)
韩天锡; 蒋淳; 魏雪丽; 韩梅; 冯德益
2004-01-01
Considering the problems that should be solved in the synthetic earthquake prediction at present, a new model is proposed in the paper. It is called joint multivariate statistical model combined by principal component analysis with discriminatory analysis. Principal component analysis and discriminatory analysis are very important theories in multivariate statistical analysis that has developed quickly in the late thirty years. By means of maximization information method, we choose several earthquake prediction factors whose cumulative proportions of total sample variances are beyond 90% from numerous earthquake prediction factors. The paper applies regression analysis and Mahalanobis discrimination to extrapolating synthetic prediction. Furthermore, we use this model to characterize and predict earthquakes in North China (30°～42°N, 108°～125°E) and better prediction results are obtained.
Ruiz Ordóñez, Magda Liliana
2008-01-01
ABSRACTThis thesis focuses on the monitoring, fault detection and diagnosis of Wastewater Treatment Plants (WWTP), which are important fields of research for a wide range of engineering disciplines. The main objective is to evaluate and apply a novel artificial intelligent methodology based on situation assessment for monitoring and diagnosis of Sequencing Batch Reactor (SBR) operation. To this end, Multivariate Statistical Process Control (MSPC) in combination with Case-Based Reasoning (CBR)...
Khound, Nayan J.; Bhattacharyya, Krishna G.
2016-08-01
The aim of this study was to assess the quality of surfacewater sources in the Jia Bharali river basin and adjoining areas of the Himalayan foothills with respect to heavy elements viz. (As, Cd, Cr, Cu, Fe, Mn, Ni, Pb and Zn) by hydrochemical and multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA). This study presents the first ever systematic analysis on toxic elements of water samples collected from 35 different surface water sources in both the dry and wet seasons for a duration of 2 hydrological years (2009-2011). Varimax factors extracted by principal component analysis indicates anthropogenic (domestic and agricultural run-off) and geogenic influences on the trace elements. Hierarchical cluster analysis grouped 35 surfacewater sources into three statistically significant clusters based on the similarity of water quality characteristics. This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective surfacewater quality management.
Khound, Nayan J.; Bhattacharyya, Krishna G.
2017-09-01
The aim of this study was to assess the quality of surfacewater sources in the Jia Bharali river basin and adjoining areas of the Himalayan foothills with respect to heavy elements viz. (As, Cd, Cr, Cu, Fe, Mn, Ni, Pb and Zn) by hydrochemical and multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA). This study presents the first ever systematic analysis on toxic elements of water samples collected from 35 different surface water sources in both the dry and wet seasons for a duration of 2 hydrological years (2009-2011). Varimax factors extracted by principal component analysis indicates anthropogenic (domestic and agricultural run-off) and geogenic influences on the trace elements. Hierarchical cluster analysis grouped 35 surfacewater sources into three statistically significant clusters based on the similarity of water quality characteristics. This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective surfacewater quality management.
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
The maximum entropy technique. System's statistical description
Belashev, B Z
2002-01-01
The maximum entropy technique (MENT) is applied for searching the distribution functions of physical values. MENT takes into consideration the demand of maximum entropy, the characteristics of the system and the connection conditions, naturally. It is allowed to apply MENT for statistical description of closed and open systems. The examples in which MENT had been used for the description of the equilibrium and nonequilibrium states and the states far from the thermodynamical equilibrium are considered
[Neuroimaging in psychiatry: multivariate analysis techniques for diagnosis and prognosis].
Kambeitz, J; Koutsouleris, N
2014-06-01
Multiple studies successfully applied multivariate analysis to neuroimaging data demonstrating the potential utility of neuroimaging for clinical diagnostic and prognostic purposes. Summary of the current state of research regarding the application of neuroimaging in the field of psychiatry. Literature review of current studies. Results of current studies indicate the potential application of neuroimaging data across various diagnoses, such as depression, schizophrenia, bipolar disorder and dementia. Potential applications include disease classification, differential diagnosis and prediction of disease course. The results of the studies are heterogeneous although some studies report promising findings. Further multicentre studies are needed with clearly specified patient populations to systematically investigate the potential utility of neuroimaging for the clinical routine.
A statistical approach for segregating cognitive task stages from multivariate fMRI BOLD time series
Directory of Open Access Journals (Sweden)
Charmaine eDemanuele
2015-10-01
Full Text Available Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from fMRI blood oxygenation level dependent (BOLD time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC, but not in the primary visual cortex (V1. Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel
Defining the ecological hydrology of Taiwan Rivers using multivariate statistical methods
Chang, Fi-John; Wu, Tzu-Ching; Tsai, Wen-Ping; Herricks, Edwin E.
2009-09-01
SummaryThe identification and verification of ecohydrologic flow indicators has found new support as the importance of ecological flow regimes is recognized in modern water resources management, particularly in river restoration and reservoir management. An ecohydrologic indicator system reflecting the unique characteristics of Taiwan's water resources and hydrology has been developed, the Taiwan ecohydrological indicator system (TEIS). A major challenge for the water resources community is using the TEIS to provide environmental flow rules that improve existing water resources management. This paper examines data from the extensive network of flow monitoring stations in Taiwan using TEIS statistics to define and refine environmental flow options in Taiwan. Multivariate statistical methods were used to examine TEIS statistics for 102 stations representing the geographic and land use diversity of Taiwan. The Pearson correlation coefficient showed high multicollinearity between the TEIS statistics. Watersheds were separated into upper and lower-watershed locations. An analysis of variance indicated significant differences between upstream, more natural, and downstream, more developed, locations in the same basin with hydrologic indicator redundancy in flow change and magnitude statistics. Issues of multicollinearity were examined using a Principal Component Analysis (PCA) with the first three components related to general flow and high/low flow statistics, frequency and time statistics, and quantity statistics. These principle components would explain about 85% of the total variation. A major conclusion is that managers must be aware of differences among basins, as well as differences within basins that will require careful selection of management procedures to achieve needed flow regimes.
Homogeneity and change-point detection tests for multivariate data using rank statistics
Lung-Yut-Fong, Alexandre; Cappé, Olivier
2011-01-01
Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point analysis. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic $p$-values for the test are then provided in the case where a single potential change-point is to be detected. Compared to available alternatives, the proposed approach appears to be very reliable and robust. This is particularly true in ...
Identifying the controls of wildfire activity in Namibia using multivariate statistics
Mayr, Manuel; Le Roux, Johan; Samimi, Cyrus
2015-04-01
data mining techniques to select a conceivable set of variables by their explanatory value and to remove redundancy. We will then apply two multivariate statistical methods suitable to a large variety of data types and frequently used for (non-linear) causative factor identification: Non-metric Multidimensional Scaling (NMDS) and Regression Trees. The assumed value of these analyses is i) to determine the most important predictor variables of fire activity in Namibia, ii) to decipher their complex interactions in driving fire variability in Namibia, and iii) to compare the performance of two state-of-the-art statistical methods. References: Le Roux, J. (2011): The effect of land use practices on the spatial and temporal characteristics of savanna fires in Namibia. Doctoral thesis at the University of Erlangen-Nuremberg/Germany - 155 pages.
Quest for HI Turbulence Statistics New Techniques
Lazarian, A; Esquivel, A; Esquivel, Alejandro
2001-01-01
HI data cubes are sources of unique information on interstellar turbulence. Doppler shifts due to supersonic motions contain information on turbulent velocity field which is otherwise difficult to obtain. However, the problem of separation of velocity and density fluctuations within HI data cubes is far from being trivial. Analytical description of the emissivity statistics of channel maps (velocity slices) in Lazarian & Pogosyan (2000) showed that the relative contribution of the density and velocity fluctuations depends on the thickness of the velocity slice. In particular, power-law assymptotics of the emissivity fluctuations change when the dispersion of the velocity at the scale under study becomes of the order of the velocity slice thickness (integrated width of the channel map). These results are the foundations of the Velocity-Channel Analysis (VCA) technique which allows to determine velocity and density statistics using 21-cm data cubes. The VCA has been successfully tested using data cubes obta...
Analysis/forecast experiments with a multivariate statistical analysis scheme using FGGE data
Baker, W. E.; Bloom, S. C.; Nestler, M. S.
1985-01-01
A three-dimensional, multivariate, statistical analysis method, optimal interpolation (OI) is described for modeling meteorological data from widely dispersed sites. The model was developed to analyze FGGE data at the NASA-Goddard Laboratory of Atmospherics. The model features a multivariate surface analysis over the oceans, including maintenance of the Ekman balance and a geographically dependent correlation function. Preliminary comparisons are made between the OI model and similar schemes employed at the European Center for Medium Range Weather Forecasts and the National Meteorological Center. The OI scheme is used to provide input to a GCM, and model error correlations are calculated for forecasts of 500 mb vertical water mixing ratios and the wind profiles. Comparisons are made between the predictions and measured data. The model is shown to be as accurate as a successive corrections model out to 4.5 days.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Fujiki, Yuya; Kumada, Yoichi; Kishimoto, Michimasa
2015-08-01
The proteomics technique, which consists of two-dimensional gel electrophoresis (2-DE), peptide mass fingerprinting (PMF), gel image analysis, and multivariate statistics, was applied to the phase analysis of a fed-batch culture for the production of a single-chain variable fragment (scFv) of an anti-C-reactive protein (CRP) antibody by Pichia pastoris. The time courses of the fed-batch culture were separated into three distinct phases: the growth phase of the batch process, the growth phase of the fed-batch process, and the production phase of the fed-batch process. Multivariate statistical analysis using 2-DE gel image analysis data clearly showed the change in the culture phase and provided information concerning the protein expression, which suggested a metabolic change related to cell growth and production during the fed-batch culture. Furthermore, specific proteins, such as alcohol oxidase, which is strongly related to scFv expression, and proteinase A, which could biodegrade scFv in the latter phases of production, were identified via the PMF method. The proteomics technique provided valuable information about the effect of the methanol concentration on scFv production.
On the potential of multivariate techniques for the determination of multidimensional efficiencies
Viaud, Benoit
2016-01-01
Differential measurements of particle collisions or decays can provide stringent constraints on physics beyond the Standard Model of particle physics. In particular, the distributions of the kinematical and angular variables that characterise heavy me- son multibody decays are non trivial and can sign the underlying interaction physics. In the era of high luminosity opened by the advent of the Large Hadron Collider and of Flavor Factories, differential measurements are less and less dominated by statistical precision and require a precise determination of efficiencies that depend simultaneously on several variables and do not factorise in these variables. This docu- ment is a reflection on the potential of multivariate techniques for the determination of such multidimensional efficiencies. We carried out two case studies that show that multilayer perceptron neural networks can determine and correct for the distortions introduced by reconstruction and selection criteria in the multidimensional phase space of t...
Analysis techniques for multivariate root loci. [a tool in linear control systems
Thompson, P. M.; Stein, G.; Laub, A. J.
1980-01-01
Analysis and techniques are developed for the multivariable root locus and the multivariable optimal root locus. The generalized eigenvalue problem is used to compute angles and sensitivities for both types of loci, and an algorithm is presented that determines the asymptotic properties of the optimal root locus.
Energy Technology Data Exchange (ETDEWEB)
None, None
2012-12-31
This report evaluates the chemistry of seep water occurring in three desert drainages near Shiprock, New Mexico: Many Devils Wash, Salt Creek Wash, and Eagle Nest Arroyo. Through the use of geochemical plotting tools and multivariate statistical analysis techniques, analytical results of samples collected from the three drainages are compared with the groundwater chemistry at a former uranium mill in the Shiprock area (the Shiprock site), managed by the U.S. Department of Energy Office of Legacy Management. The objective of this study was to determine, based on the water chemistry of the samples, if statistically significant patterns or groupings are apparent between the sample populations and, if so, whether there are any reasonable explanations for those groupings.
An Improvement of the Hotelling T2 Statistic in Monitoring Multivariate Quality Characteristics
Directory of Open Access Journals (Sweden)
Ashkan Shabbak
2012-01-01
Full Text Available The Hotelling T2 statistic is the most popular statistic used in multivariate control charts to monitor multiple qualities. However, this statistic is easily affected by the existence of more than one outlier in the data set. To rectify this problem, robust control charts, which are based on the minimum volume ellipsoid and the minimum covariance determinant, have been proposed. Most researchers assess the performance of multivariate control charts based on the number of signals without paying much attention to whether those signals are really outliers. With due respect, we propose to evaluate control charts not only based on the number of detected outliers but also with respect to their correct positions. In this paper, an Upper Control Limit based on the median and the median absolute deviation is also proposed. The results of this study signify that the proposed Upper Control Limit improves the detection of correct outliers but that it suffers from a swamping effect when the positions of outliers are not taken into consideration. Finally, a robust control chart based on the diagnostic robust generalised potential procedure is introduced to remedy this drawback.
Ghanate, A D; Kothiwale, S; Singh, S P; Bertrand, Dominique; Krishna, C Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Institute of Scientific and Technical Information of China (English)
Bundit Boonkhao; Xue Z. Wang
2012-01-01
Ultrasonic attenuation spectroscopy (UAS) is an attractive process analytical technology (PAT) for on-line real-time characterisation of slurries for particle size distribution (PSD) estimation.It is however only applicable to relatively low solid concentrations since existing instrument process models still cannot fully take into account the phenomena of particle-particle interaction and multiple scattering,leading to errors in PSD estimation.This paper investigates an alternative use of the raw attenuation spectra for direct multivariate statistical process control (MSPC).The UAS raw spectra were processed using principal component analysis.The selected principal components were used to derive two MSPC statistics,the Hotelling's T2 and square prediction error (SPE).The method is illustrated and demonstrated by reference to a wet milling process for processing nanoparticles.
Probabilistic radar-gauge merging by multivariate spatiotemporal techniques
Pulkkinen, Seppo; Koistinen, Jarmo; Kuitunen, Timo; Harri, Ari-Matti
2016-11-01
The quality of quantitative precipitation estimation (QPE) is degraded by considerable discrepancies between radar and ground measurements, which are common due to inherent uncertainties between these two kinds of sensor systems. The causes include measurement errors and differences in sampling schemes. Nevertheless, the remaining discrepancies can be statistically modeled. A model describing detection probabilities of ground rainfall, systematic biases as well as the variance of residual discrepancies between radar and rain gauges is developed. These are modeled by means of multiple explanatory variables such as rain rate and distance from radar. The model is implemented by using nonparametric kernel methods and spatiotemporal Kriging interpolation. A key feature of the model is that for a given radar-derived rainfall field and explanatory variables, it determines probability distributions for the corresponding ground rainfall. Unbiased estimates for ground rainfall can be obtained from the expected values of the distributions. From such distributions, one can also obtain uncertainty estimates and exceedance probabilities that are important for hydrological applications. Performance of the model is assessed by cross-validation using hourly rainfall accumulations measured by the Finnish rain gauges and C-band dual polarization radars.
Bevacqua, Emanuele; Maraun, Douglas; Hobæk Haff, Ingrid; Widmann, Martin; Vrac, Mathieu
2017-04-01
Compound events are multivariate extreme events in which the individual contributing variables may not be extreme themselves, but their joint - dependent - occurrence causes an extreme impact. The conventional univariate statistical analysis cannot give accurate information regarding the multivariate nature of these events. We develop a conceptual model, implemented via pair-copula constructions, which allows for the quantification of the risk associated with compound events in present day and future climate, as well as the uncertainty estimates around such risk. The model includes meteorological predictors which provide insight into both the involved physical processes, and the temporal variability of CEs. Moreover, this model provides multivariate statistical downscaling of compound events. Downscaling of compound events is required to extend their risk assessment to the past or future climate, where climate models either do not simulate realistic values of the local variables driving the events, or do not simulate them at all. Based on the developed model, we study compound floods, i.e. joint storm surge and high river runoff, in Ravenna (Italy). To explicitly quantify the risk, we define the impact of compound floods as a function of sea and river levels. We use meteorological predictors to extend the analysis to the past, and get a more robust risk analysis. We quantify the uncertainties of the risk analysis observing that they are very large due to the shortness of the available data, though this may also be the case in other studies where they have not been estimated. Ignoring the dependence between sea and river levels would result in an underestimation of risk, in particular the expected return period of the highest compound flood observed increases from about 20 to 32 years when switching from the dependent to the independent case.
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
Directory of Open Access Journals (Sweden)
Chen-Lin Soo
2017-01-01
Full Text Available The study on Sarawak coastal water quality is scarce, not to mention the application of the multivariate statistical approach to investigate the spatial variation of water quality and to identify the pollution source in Sarawak coastal water. Hence, the present study aimed to evaluate the spatial variation of water quality along the coastline of the southwestern region of Sarawak using multivariate statistical techniques. Seventeen physicochemical parameters were measured at 11 stations along the coastline with approximately 225 km length. The coastal water quality showed spatial heterogeneity where the cluster analysis grouped the 11 stations into four different clusters. Deterioration in coastal water quality has been observed in different regions of Sarawak corresponding to land use patterns in the region. Nevertheless, nitrate-nitrogen exceeded the guideline value at all sampling stations along the coastline. The principal component analysis (PCA has determined a reduced number of five principal components that explained 89.0% of the data set variance. The first PC indicated that the nutrients were the dominant polluting factors, which is attributed to the domestic, agricultural, and aquaculture activities, followed by the suspended solids in the second PC which are related to the logging activities.
Tavakol, Mitra; Arjmandi, Reza; Shayeghi, Mansoureh; Monavari, Seyed Masoud; Karbassi, Abdolreza
2017-01-01
One of the key issues in determining the quality of water in rivers is to create a water quality control network with a suitable performance. The measured qualitative variables at stations should be representative of all the changes in water quality in water systems. Since the increase in water quality monitoring stations increases annual monitoring costs, recognition of the stations with higher importance as well as main parameters can be effective in future decisions to improve the existing monitoring network. Sampling was carried out on 12 physical and chemical parameters measured at 15 stations during 2013-2014 in Haraz River, northern Iran. The results of the measurements were analyzed using multivariate statistical analysis methods including cluster analysis (CA), principal component analysis (PCA), factor analysis (FA), and discriminant analysis (DA). According to the CA, PCA, and FA, the stations were divided into three groups of high pollution, medium pollution, and low pollution. The research findings confirm applicability of multivariate statistical techniques in the interpretation of large data sets, water quality assessment, and source apportionment of different pollution sources.
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-07-01
A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
Directory of Open Access Journals (Sweden)
Cerqueira Fabio R
2012-10-01
Full Text Available Abstract Background The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. Results Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. Conclusion Our approach not only enhances the computational performance, and
Digital Repository Service at National Institute of Oceanography (India)
Jayalakshmy, K.V.; Rao, K.K.
A study of planktonic foraminiferal assemblages from 19 stations in the neritic and oceanic regions off the Coromandel Coast, Bay of Bengal has been made using a multivariate statistical method termed as factor analysis. On the basis of abundance...
Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida
Sayemuzzaman, M.; Ye, M.
2015-12-01
The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface
Wolf, S. F.; Lipschutz, M. E.
1992-07-01
logistic regression statistical techniques as tools for discriminant analysis. A randomization-simulation technique can also be used to make distribution-independent comparisons and to verify that any observed differences are not due to insufficient samples or too many independent variables (Lipschutz and Samuels, 1991). These methods allow us to test for the existence of distinct compositional subpopulations in what is supposedly a single meteorite population. At the time of writing this abstract our database consists of 55 H4-6 chondrites (Lingner et al, 1987 and this work). Nine of these meteorites are members of the proposed "cluster 1" co-orbital meteoroid stream. For these 9 samples, linear discriminant analysis based on the concentrations of 10 labile trace elements reveals a difference between the "cluster 1" subpopulation of H chondrite falls and all other H chondrite falls at the reveals a difference at the Steele, D. (1988) Icarus 75, 64-96. Wetherill, G. W. (1986) Nature 319, 357-358. Wolf, S. F. and Lipschutz, M. E. (1992) Lunar Planet. Sci. (abstract) 23, 1545-1546.
Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A
2008-12-01
It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
A survey of statistical downscaling techniques
Energy Technology Data Exchange (ETDEWEB)
Zorita, E.; Storch, H. von [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Hydrophysik
1997-12-31
The derivation of regional information from integrations of coarse-resolution General Circulation Models (GCM) is generally referred to as downscaling. The most relevant statistical downscaling techniques are described here and some particular examples are worked out in detail. They are classified into three main groups: linear methods, classification methods and deterministic non-linear methods. Their performance in a particular example, winter rainfall in the Iberian peninsula, is compared to a simple downscaling analog method. It is found that the analog method performs equally well than the more complicated methods. Downscaling analysis can be also used as a tool to validate regional performance of global climate models by analyzing the covariability of the simulated large-scale climate and the regional climates. (orig.) [Deutsch] Die Ableitung regionaler Information aus Integrationen grob aufgeloester Klimamodelle wird als `Regionalisierung` bezeichnet. Dieser Beitrag beschreibt die wichtigsten statistischen Regionalisierungsverfahren und gibt darueberhinaus einige detaillierte Beispiele. Regionalisierungsverfahren lassen sich in drei Hauptgruppen klassifizieren: lineare Verfahren, Klassifikationsverfahren und nicht-lineare deterministische Verfahren. Diese Methoden werden auf den Niederschlag auf der iberischen Halbinsel angewandt und mit den Ergebnissen eines einfachen Analog-Modells verglichen. Es wird festgestellt, dass die Ergebnisse der komplizierteren Verfahren im wesentlichen auch mit der Analog-Methode erzielt werden koennen. Eine weitere Anwendung der Regionalisierungsmethoden besteht in der Validierung globaler Klimamodelle, indem die simulierte und die beobachtete Kovariabilitaet zwischen dem grosskaligen und dem regionalen Klima miteinander verglichen wird. (orig.)
Mallamace, Domenico; Corsaro, Carmelo; Salvo, Andrea; Cicero, Nicola; Macaluso, Andrea; Giangrosso, Giuseppe; Ferrantelli, Vincenzo; Dugo, Giacomo
2014-05-01
We have studied by means of High Resolution Magic Angle Spinning Nuclear Magnetic Resonance the metabolic profile of the famous Sicilian cherry tomato of Pachino. Thanks to its organoleptic and healthy properties, this particular foodstuff was the first tomato accredited by the European PGI (Protected Geographical Indication) certification of quality. Due to the relatively high price of the final product commercial frauds originated in the Italian and international markets. Hence, there is a growing interest to develop analytical techniques able to predict the origin of a tomato sample, indicating whether or not it originates from the area of Pachino, Sicily (Italy). In this paper we have determined the molar concentration of the metabolites constituent the PGI cherry tomato of Pachino. Furthermore, by means of a multivariate statistical analysis we have identified which metabolites are relevant for sample differentiation.
Oprea, Cristiana; Gustova, Marina V; Oprea, Ioan A; Buzguta, Violeta L
2014-01-01
X-ray fluorescence spectrometry (XRFS) was used as a multielement method of evaluation of individual whole human tooth or tooth tissues for their amounts of trace elements. Measurements were carried out on human enamel, dentine, and dental cementum, and some differences in tooth matrix composition were noted. In addition, the elemental concentrations determined in teeth from subjects of different ages, nutritional states, professions and gender, living under various environmental conditions and dietary habits, were included in a comparison by multivariate statistical analysis (MVSA) methods. By factor analysis it was established that inorganic components of human teeth varied consistently with their source in the tissue, with more in such tissue from females than in that from males, and more in tooth incisor than in tooth molar.
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Fault detection of a spur gear using vibration signal with multivariable statistical parameters
Directory of Open Access Journals (Sweden)
Songpon Klinchaeam
2014-10-01
Full Text Available This paper presents a condition monitoring technique of a spur gear fault detection using vibration signal analysis based on time domain. Vibration signals were acquired from gearboxes and used to simulate various faults on spur gear tooth. In this study, vibration signals were applied to monitor a normal and various fault conditions of a spur gear such as normal, scuffing defect, crack defect and broken tooth. The statistical parameters of vibration signal were used to compare and evaluate the value of fault condition. This technique can be applied to set alarm limit of the signal condition based on statistical parameter such as variance, kurtosis, rms and crest factor. These parameters can be used to set as a boundary decision of signal condition. From the results, the vibration signal analysis with single statistical parameter is unclear to predict fault of the spur gears. The using at least two statistical parameters can be clearly used to separate in every case of fault detection. The boundary decision of statistical parameter with the 99.7% certainty ( 3 from 300 referenced dataset and detected the testing condition with 99.7% ( 3 accuracy and had an error of less than 0.3 % using 50 testing dataset.
Definition of a territorial tourist attractiveness index: a multivariate statistical approach
Directory of Open Access Journals (Sweden)
Roberto Gismondi
2007-10-01
Full Text Available Theoretical and effective tourist attractiveness should be evaluated at a very detailed territorial level. For this reason, we propose a selection of statistical variables measured at the municipality level, useful for the calculation of a tourist index on the basis of three compared statistical techniques. An empirical effort has been carried out on the 64 municipalities belonging to the Foggia province, with reference to year 2002. Finally, we have often stressed the operative usefulness of the final tourist indexes, both to correctly classify municipalities from a tourist point of view and to render easier the identification of the so called Sistemi Turistici Locali (STL.
Leary, James F.; McLaughlin, Scott R.; Reece, Lisa M.; Rosenblatt, Judah I.; Hokanson, James A.
1999-06-01
Multivariate statistics can be used for visualization of cell subpopulations in multidimensional data space and for classification of cells within that data space. New data mining techniques we have developed, such as subtractive clustering, can be used to find the differences between test and control multiparameter flow cytometric data, e.g. in the problem of human stem cell isolation with tumor purging. They also can provide training data for subsequent multivariate statistical classification techniques such as discriminant function or logistic regression analyses. Using lookup tables, these multivariate statistical calculations can be performed in real-time, and can even include probabilities of misclassification. Thus, the only distinction between off-line classification of cells in data analysis and real-time statistical decision-making for cell sorting is the time limit in which a classification decision must be made. For real-time cell sorting we presently are able to perform these classifications in less than 625 microseconds, corresponding to the time that it takes the cell to travel from the laser intersection point to the sort decision point in a flow cytometer/cell sorter. Statistical decision making and the ability to include the costs of misclassification into that decision process will become important as flow cytometry/cell sorting moves from diagnostics to therapeutics.
Energy Technology Data Exchange (ETDEWEB)
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Valdez, C. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sanner, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Martinez, H. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-11-28
Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduled precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.
Prats-Montalbán, José M.; López, Fernando; Valiente, José M.; Ferrer, Alberto
2007-01-01
In this paper we present an innovative way to simultaneously perform feature extraction and classification for the quality control issue of surface grading by applying two well known multivariate statistical projection tools (SIMCA and PLS-DA). These tools have been applied to compress the color texture data describing the visual appearance of surfaces (soft color texture descriptors) and to directly perform classification using statistics and predictions computed from the extracted projection models. Experiments have been carried out using an extensive image database of ceramic tiles (VxC TSG). This image database is comprised of 14 different models, 42 surface classes and 960 pieces. A factorial experimental design has been carried out to evaluate all the combinations of several factors affecting the accuracy rate. Factors include tile model, color representation scheme (CIE Lab, CIE Luv and RGB) and compression/classification approach (SIMCA and PLS-DA). In addition, a logistic regression model is fitted from the experiments to compute accuracy estimates and study the factors effect. The results show that PLS-DA performs better than SIMCA, achieving a mean accuracy rate of 98.95%. These results outperform those obtained in a previous work where the soft color texture descriptors in combination with the CIE Lab color space and the k-NN classi.er achieved a 97.36% of accuracy.
Energy Technology Data Exchange (ETDEWEB)
Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V. [Oak Ridge National Laboratory, Institute for Functional Imaging of Materials, Center for Nanophase Material Science, Oak Ridge, Tennessee 37922 (United States); Sales, Brian C.; Sefat, Athena S. [Oak Ridge National Laboratory, Materials Science and Technology Division, Oak Ridge, Tennessee 37922 (United States)
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
Directory of Open Access Journals (Sweden)
Md. Bodrud-Doza
2016-04-01
Full Text Available This study investigates the groundwater quality in the Faridpur district of central Bangladesh based on preselected 60 sample points. Water evaluation indices and a number of statistical approaches such as multivariate statistics and geostatistics are applied to characterize water quality, which is a major factor for controlling the groundwater quality in term of drinking purposes. The study reveal that EC, TDS, Ca2+, total As and Fe values of groundwater samples exceeded Bangladesh and international standards. Ground water quality index (GWQI exhibited that about 47% of the samples were belonging to good quality water for drinking purposes. The heavy metal pollution index (HPI, degree of contamination (Cd, heavy metal evaluation index (HEI reveal that most of the samples belong to low level of pollution. However, Cd provide better alternative than other indices. Principle component analysis (PCA suggests that groundwater quality is mainly related to geogenic (rock–water interaction and anthropogenic source (agrogenic and domestic sewage in the study area. Subsequently, the findings of cluster analysis (CA and correlation matrix (CM are also consistent with the PCA results. The spatial distributions of groundwater quality parameters are determined by geostatistical modeling. The exponential semivariagram model is validated as the best fitted models for most of the indices values. It is expected that outcomes of the study will provide insights for decision makers taking proper measures for groundwater quality management in central Bangladesh.
Energy Technology Data Exchange (ETDEWEB)
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mew, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2015-09-24
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.
One approach in using multivariate statistical process control in analyzing cheese quality
Directory of Open Access Journals (Sweden)
Ilija Djekic
2015-05-01
Full Text Available The objective of this paper was to investigate possibility of using multivariate statistical process control in analysing cheese quality parameters. Two cheese types (white brined cheeses and soft cheese from ultra-filtered milk were selected and analysed for several quality parameters such as dry matter, milk fat, protein contents, pH, NaCl, fat in dry matter and moisture in non-fat solids. The obtained results showed significant variations for most of the quality characteristics which were examined among the two types of cheese. The only stable parameter in both types of cheese was moisture in non-fat solids. All of the other cheese quality characteristics were characterized above or below control limits for most of the samples. Such results indicated a high instability and variations within cheese production. Although the use of statistical process control is not mandatory in the dairy industry, it might provide benefits to organizations in improving quality control of dairy products.
Energy Technology Data Exchange (ETDEWEB)
Fouque, A.L.; Ciuciu, Ph.; Risser, L. [NeuroSpin/CEA, F-91191 Gif-sur-Yvette (France); Fouque, A.L.; Ciuciu, Ph.; Risser, L. [IFR 49, Institut d' Imagerie Neurofonctionnelle, Paris (France)
2009-07-01
In this paper, a novel statistical parcellation of intra-subject functional MRI (fMRI) data is proposed. The key idea is to identify functionally homogenous regions of interest from their hemodynamic parameters. To this end, a non-parametric voxel-based estimation of hemodynamic response function is performed as a prerequisite. Then, the extracted hemodynamic features are entered as the input data of a Multivariate Spatial Gaussian Mixture Model (MSGMM) to be fitted. The goal of the spatial aspect is to favor the recovery of connected components in the mixture. Our statistical clustering approach is original in the sense that it extends existing works done on univariate spatially regularized Gaussian mixtures. A specific Gibbs sampler is derived to account for different covariance structures in the feature space. On realistic artificial fMRI datasets, it is shown that our algorithm is helpful for identifying a parsimonious functional parcellation required in the context of joint detection estimation of brain activity. This allows us to overcome the classical assumption of spatial stationarity of the BOLD signal model. (authors)
An Application of Multivariate Statistical Analysis for Query-Driven Visualization
Energy Technology Data Exchange (ETDEWEB)
Gosink, Luke J.; Garth, Christoph; Anderson, John C.; Bethel, E. Wes; Joy, Kenneth I.
2010-03-01
Abstract?Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Use of multivariate statistics to identify unreliable data obtained using CASA.
Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón
2013-06-01
In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Feyissa, Daniel D.; Aher, Yogesh D.; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment.
Multivariate statistical process control in product quality review assessment - A case study.
Kharbach, M; Cherrah, Y; Vander Heyden, Y; Bouklouze, A
2017-08-07
According to the Food and Drug Administration and the European Good Manufacturing Practices (GMP) guidelines, Annual Product Review (APR) is a mandatory requirement in GMP. It consists of evaluating a large collection of qualitative or quantitative data in order to verify the consistency of an existing process. According to the Code of Federal Regulation Part 11 (21 CFR 211.180), all finished products should be reviewed annually for the quality standards to determine the need of any change in specification or manufacturing of drug products. Conventional Statistical Process Control (SPC) evaluates the pharmaceutical production process by examining only the effect of a single factor at the time using a Shewhart's chart. It neglects to take into account the interaction between the variables. In order to overcome this issue, Multivariate Statistical Process Control (MSPC) can be used. Our case study concerns an APR assessment, where 164 historical batches containing six active ingredients, manufactured in Morocco, were collected during one year. Each batch has been checked by assaying the six active ingredients by High Performance Liquid Chromatography according to European Pharmacopoeia monographs. The data matrix was evaluated both by SPC and MSPC. The SPC indicated that all batches are under control, while the MSPC, based on Principal Component Analysis (PCA), for the data being either autoscaled or robust scaled, showed four and seven batches, respectively, out of the Hotelling T(2) 95% ellipse. Also, an improvement of the capability of the process is observed without the most extreme batches. The MSPC can be used for monitoring subtle changes in the manufacturing process during an APR assessment. Copyright © 2017 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.
Martin, David; Boyle, Fergal
2015-09-01
Several clinical studies have identified a strong correlation between neointimal hyperplasia following coronary stent deployment and both stent-induced arterial injury and altered vessel hemodynamics. As such, the sequential structural and fluid dynamics analysis of balloon-expandable stent deployment should provide a comprehensive indication of stent performance. Despite this observation, very few numerical studies of balloon-expandable coronary stents have considered both the mechanical and hemodynamic impact of stent deployment. Furthermore, in the few studies that have considered both phenomena, only a small number of stents have been considered. In this study, a sequential structural and fluid dynamics analysis methodology was employed to compare both the mechanical and hemodynamic impact of six balloon-expandable coronary stents. To investigate the relationship between stent design and performance, several common stent design properties were then identified and the dependence between these properties and both the mechanical and hemodynamic variables of interest was evaluated using statistical measures of correlation. Following the completion of the numerical analyses, stent strut thickness was identified as the only common design property that demonstrated a strong dependence with either the mean equivalent stress predicted in the artery wall or the mean relative residence time predicted on the luminal surface of the artery. These results corroborate the findings of the large-scale ISAR-STEREO clinical studies and highlight the crucial role of strut thickness in coronary stent design. The sequential structural and fluid dynamics analysis methodology and the multivariable statistical treatment of the results described in this study should prove useful in the design of future balloon-expandable coronary stents.
Xu, Peng; Rizzoni, Elizabeth Anne; Sul, Se-Yeong; Stephanopoulos, Gregory
2017-01-20
Metabolic engineering entails target modification of cell metabolism to maximize the production of a specific compound. For empowering combinatorial optimization in strain engineering, tools and algorithms are needed to efficiently sample the multidimensional gene expression space and locate the desirable overproduction phenotype. We addressed this challenge by employing design of experiment (DoE) models to quantitatively correlate gene expression with strain performance. By fractionally sampling the gene expression landscape, we statistically screened the dominant enzyme targets that determine metabolic pathway efficiency. An empirical quadratic regression model was subsequently used to identify the optimal gene expression patterns of the investigated pathway. As a proof of concept, our approach yielded the natural product violacein at 525.4 mg/L in shake flasks, a 3.2-fold increase from the baseline strain. Violacein production was further increased to 1.31 g/L in a controlled benchtop bioreactor. We found that formulating discretized gene expression levels into logarithmic variables (Linlog transformation) was essential for implementing this DoE-based optimization procedure. The reported methodology can aid multivariate combinatorial pathway engineering and may be generalized as a standard procedure for accelerating strain engineering and improving metabolic pathway efficiency.
Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes
Indian Academy of Sciences (India)
Tanushree Haldar; Saurabh Ghosh
2015-12-01
Clinical end-point traits are usually governed by quantitative precursors. Hence, there is active research interest in developing statistical methods for association mapping of quantitative traits. Unlike population-based tests for association, family-based tests for transmission disequilibrium are protected against population stratification. In this study, we propose a logistic regression model to test the association for quantitative traits based on a trio design. We show that the method can be viewed as a direct extension of the classical transmission diequilibrium test for binary traits to quantitative traits. We evaluate the performance of our method using extensive simulations and compare it with an existing method, family-based association test. We found that the two methods yield comparable powers if all families are considered. However, unlike FBAT, which yields an inflated rate of false positives when noninformative trios with all three individuals’ heterozygous are removed, our method maintains the correct size without compromising too much on power. We show that our method can be easily modified to incorporate multivariate phenotypes. Here, we applied this method to analyse a quantitative endophenotype associated with alcoholism.
Kaown, D.; Hyun, Y.; Lee, K.
2004-12-01
The characterization of groundwater contamination at a hydrologically complex agricultural site in Youpori, Chooncheon (Korea) was undertaken by analyzing hydro-chemical data of groundwater within a statistical framework. The data show that high and correlated concentrations of Ca, Mg, and NO3 reflected the polluted nature of groundwater at the site. More than 39% of samples showed nitrate concentrations above the human affected value (3mg/L as NO3-N ), while about 25% samples exceeded the maximum acceptable level (10mg/L as NO3-N ) according to the EPA regulation. Multivariate analyses (factor and cluster analyses) were used to identify contaminant pathway, source and geochemical process. The geostatistical method was applied in order to delineate the spatial extent and variation of nitrate contamination. Factor and cluster analyses indicate that hydrochemical data can clearly characterize the non-point contamination over the area by agrochemical fertilizer as well as point-source pollution like manure spreading near barn or pigpen on groundwater. Nitrate-N, the critical species in the study area, was used to delineate the spatial spread of the contaminants using kriging in the study area.
Hernanz, Dolores; Recamales, Angeles F; Meléndez-Martínez, Antonio J; González-Miret, M Lourdes; Heredia, Francisco J
2008-04-23
Apart from the need to assess the color of foods due to its preponderant role in their acceptability, there is currently a new trend consisting in the study of the relationships between the color and the pigments accounting for it. The color of five strawberry varieties cultivated in two different soilless systems has been studied, and an array of multivariate statistical methods have been performed to single out the color parameters that best discriminate among the different samples surveyed and to correlate them with the pigment content. It is concluded that there is not a direct relationship between the external and flesh colorations of the berries. Additionally, after discriminant methods were applied, it was noticed that, taking into account the strawberry varieties, >90% of the cases could be correctly classified, a noticeably lower percentage of correct classification (around 60%) being obtained when the type of cultivation system was selected as the criterion for discrimination. The best correlations of pigment-color coordinates were found between pelargonidin-3-rutinoside and the external a* (r= -0.87) followed by pelargonidin-3-glucoside and the internal L* (r= -0.72).
DEFF Research Database (Denmark)
Ludvigsen, Liselotte; Albrechtsen, Hans-Jørgen; Rootzén, Helle
1997-01-01
Different multivariate statistical analyses were applied to phospholipid fatty acids representing the biomass composition and to different biogeochemical parameters measured in 37 samples from a landfill contaminated aquifer at Grindsted Landfill (Denmark). Principal component analysis and corres......Different multivariate statistical analyses were applied to phospholipid fatty acids representing the biomass composition and to different biogeochemical parameters measured in 37 samples from a landfill contaminated aquifer at Grindsted Landfill (Denmark). Principal component analysis....... Partial least square analysis related the phospholipid fatty acids data to the biogeochemical parameters assuming linear relationships. After selection of the optimal phospholipid fatty acid combination by genetic algorithms, good partial least squares models with low prediction errors were gained...
Malm, Christer B; Khoo, Nelson S; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas
2016-01-01
achieved in two separate trials. In conclusions, autologous re-infusion of RBCs increased VO2max and performance as hypothesized, but hematological profiling by multivariate statistics could not reach the WADA stipulated false positive ratio of <0.001% at any time point investigated. A majority of samples remained within limits of normal individual variation at all times.
Zhou, Ran; Peng, Shi-Tao; Qin, Xue-Bo; Shi, Hong-Hua; Ding, De-Wen
2013-03-01
A detailed field survey of hydrological, chemical and biological resources was conducted in the Bohai Bay in spring and summer 2007. The distributions of phytoplankton and their relations to environmental factors were investigated with multivariate analysis techniques. Totally 17 and 23 taxa were identified in spring and summer, respectively. The abundance of phytoplankton in spring was 115 x 10(4) cells x m(-3), which was significantly higher than that in summer (3.1 x 10(4) cells x m(-3)). Characteristics of phytoplankton assemblages in the two seasons were identified using principal component analysis (PCA), while redundancy analysis (RDA) was used to examine the environmental variables that may explain the patterns of variation of the phytoplankton community. Based on PCA results, in the spring, the phytoplankton was mainly distributed in the center and northern water zone, where the nitrate nitrogen concentration was higher. However, in summer, phytoplankton was found distributed in all zones of Bohai Bay, while the dominant species was mainly distributed in the estuary. RDA indicated that the key environmental factors that influenced phytoplankton assemblages in the spring were nitrate nitrogen (NO3(-) -N), nitrite nitrogen (NO2(-) -N) and soluble reactive phosphorus (SRP), while ammonium nitrogen (NH4(+) -N) and water temperature (WT) played key roles in summer.
Experimental Data Mining Techniques(Using Multiple Statistical Methods
Directory of Open Access Journals (Sweden)
Mustafa Zaidi
2012-05-01
Full Text Available This paper discusses the possible solutions of non-linear multivariable by experimental Data mining techniques using on orthogonal array. Taguchi method is a very useful technique to reduce the time and cost of the experiment but the ignoring all kind of interaction effects. The results are not much encouraging and motivate to study Laser cutting process of non-linear multivariable is modeled by one and two way analysis of variance also linear and non linear regression analysis. These techniques are used to explore better analysis techniques and improve the laser cutting quality by reducing process variations caused by controllable process parameters. The size of data set causes difficulties in modeling and simulation of the problem such as decision tree is useful technique but it is not able to predict better results. The results of analysis of variance are encouraging. Taguchi and regression normally optimizes input process parameters for single characteristics.
Chen, Peiying; Zhang, Weidong
2007-04-01
This paper improves an inverted decoupling technique for a class of stable linear multivariable processes with multiple time delays and nonminimum-phase zeros. Two decoupling schemes are proposed based on the inverted decoupling technique. One is a developed inverted decoupling scheme. In this scheme, the decoupler is designed such that the inverted decoupling technique accommodates a wider field than the one introduced in the published literature. However, due to the stability issue, some multivariable processes still cannot be decoupled by the inverted decoupling structure. To solve this problem, another modified decoupling scheme with unity feedback structure is suggested for implementation. The Internal Model Control (IMC) theory is applied here to design PI/PID controllers for the decoupled processes. Furthermore, in the presence of multiplicative input uncertainty, low bounds of the control parameters are derived quantitatively for guaranteeing robust stability of the system. Simulations are illustrated for demonstrating the validity of the proposed control schemes.
Energy Technology Data Exchange (ETDEWEB)
Mattingly, J.K.
2001-03-08
The development of high order statistical analyses applied to measurements of the temporal evolution of fission chain-reactions is described. These statistics are derived via application of Bayes' rule to conditional probabilities describing a sequence of events in a fissile system beginning with the initiation of a chain-reaction by source neutrons and ending with counting events in a collection of neutron-sensitive detectors. Two types of initiating neutron sources are considered: (1) a directly observable source introduced by the experimenter (active initiation), and (2) a source that is intrinsic to the system and is not directly observable (passive initiation). The resulting statistics describe the temporal distribution of the population of prompt neutrons in terms of the time-delays between members of a collection (an n-tuplet) of correlated detector counts, that, in turn, may be collectively correlated with a detected active source neutron emission. These developments are a unification and extension of Rossi-a, pulsed neutron, and neutron noise methods, each of which measure the temporal distribution of pairs of correlated events, to produce a method that measures the temporal distribution of n-tuplets of correlated counts of arbitrary dimension n. In general the technique should expand present capabilities in the analysis of neutron counting measurements.
Energy Technology Data Exchange (ETDEWEB)
Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo [Kyung Hee University, Yongin (Korea, Republic of)
2015-08-15
Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.
Multivariate Statistical Analysis of Cigarette Design Feature Influence on ISO TNCO Yields.
Agnew-Heard, Kimberly A; Lancaster, Vicki A; Bravo, Roberto; Watson, Clifford; Walters, Matthew J; Holman, Matthew R
2016-06-20
The aim of this study is to explore how differences in cigarette physical design parameters influence tar, nicotine, and carbon monoxide (TNCO) yields in mainstream smoke (MSS) using the International Organization of Standardization (ISO) smoking regimen. Standardized smoking methods were used to evaluate 50 U.S. domestic brand cigarettes and a reference cigarette representing a range of TNCO yields in MSS collected from linear smoking machines using a nonintense smoking regimen. Multivariate statistical methods were used to form clusters of cigarettes based on their ISO TNCO yields and then to explore the relationship between the ISO generated TNCO yields and the nine cigarette physical design parameters between and within each cluster simultaneously. The ISO generated TNCO yields in MSS are 1.1-17.0 mg tar/cigarette, 0.1-2.2 mg nicotine/cigarette, and 1.6-17.3 mg CO/cigarette. Cluster analysis divided the 51 cigarettes into five discrete clusters based on their ISO TNCO yields. No one physical parameter dominated across all clusters. Predicting ISO machine generated TNCO yields based on these nine physical design parameters is complex due to the correlation among and between the nine physical design parameters and TNCO yields. From these analyses, it is estimated that approximately 20% of the variability in the ISO generated TNCO yields comes from other parameters (e.g., filter material, filter type, inclusion of expanded or reconstituted tobacco, and tobacco blend composition, along with differences in tobacco leaf origin and stalk positions and added ingredients). A future article will examine the influence of these physical design parameters on TNCO yields under a Canadian Intense (CI) smoking regimen. Together, these papers will provide a more robust picture of the design features that contribute to TNCO exposure across the range of real world smoking patterns.
Djorgovski, S. G.
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has
Chen, Zhe; Qiu, Zurong; Huo, Xinming; Fan, Yuming; Li, Xinghua
2017-03-01
A fiber-capacitive drop analyzer is an instrument which monitors a growing droplet to produce a capacitive opto-tensiotrace (COT). Each COT is an integration of fiber light intensity signals and capacitance signals and can reflect the unique physicochemical property of a liquid. In this study, we propose a solution analytical and concentration quantitative method based on multivariate statistical methods. Eight characteristic values are extracted from each COT. A series of COT characteristic values of training solutions at different concentrations compose a data library of this kind of solution. A two-stage linear discriminant analysis is applied to analyze different solution libraries and establish discriminant functions. Test solutions can be discriminated by these functions. After determining the variety of test solutions, Spearman correlation test and principal components analysis are used to filter and reduce dimensions of eight characteristic values, producing a new representative parameter. A cubic spline interpolation function is built between the parameters and concentrations, based on which we can calculate the concentration of the test solution. Methanol, ethanol, n-propanol, and saline solutions are taken as experimental subjects in this paper. For each solution, nine or ten different concentrations are chosen to be the standard library, and the other two concentrations compose the test group. By using the methods mentioned above, all eight test solutions are correctly identified and the average relative error of quantitative analysis is 1.11%. The method proposed is feasible which enlarges the applicable scope of recognizing liquids based on the COT and improves the concentration quantitative precision, as well.
Li, Jia; Zhang, Haibo; Chen, Yongshan; Luo, Yongming; Zhang, Hua
2016-07-01
To quantify the extent of antibiotic contamination and to identity the dominant pollutant sources in the Tiaoxi River Watershed, surface water samples were collected at eight locations and analyzed for four tetracyclines and three sulfonamides using ultra-performance liquid chromatography tandem mass spectrometry (UPLC-MS/MS). The observed maximum concentrations of tetracycline (623 ng L(-1)), oxytetracycline (19,810 ng L(-1)), and sulfamethoxazole (112 ng L(-1)) exceeded their corresponding Predicted No Effect Concentration (PNEC) values. In particular, high concentrations of antibiotics were observed in wet summer with heavy rainfall. The maximum concentrations of antibiotics appeared in the vicinity of intensive aquaculture areas. High-resolution land use data were used for identifying diffuse source of antibiotic pollution in the watershed. Significant correlations between tetracycline and developed (r = 0.93), tetracycline and barren (r = 0.87), oxytetracycline and barren (r = 0.82), and sulfadiazine and agricultural facilities (r = 0.71) were observed. In addition, the density of aquaculture significantly correlated with doxycycline (r = 0.74) and oxytetracycline (r = 0.76), while the density of livestock significantly correlated with sulfadiazine (r = 0.71). Principle Component Analysis (PCA) indicated that doxycycline, tetracycline, oxytetracycline, and sulfamethoxazole were from aquaculture and domestic sources, whereas sulfadiazine and sulfamethazine were from livestock wastewater. Flood or drainage from aquaculture ponds was identified as a major source of antibiotics in the Tiaoxi watershed. A hot-spot map was created based on results of land use analysis and multi-variable statistics, which provided an effective management tool of sources identification in watersheds with multiple diffuse sources of antibiotic pollution.
Brauchler, R.; Cheng, J.; Dietrich, P.; Everett, M.; Johnson, B.; Sauter, M.
2005-12-01
Knowledge about the spatial variations in hydraulic properties plays an important role controlling solute movement in saturated flow systems. Traditional hydrogeological approaches appear to have difficulties providing high resolution parameter estimates. Thus, we have decided to develop an approach coupling the two existing hydraulic tomographic approaches: a) Inversion of the drawdown as a function of time (amplitude inversion) and b) the inversion of travel times of the pressure disturbance. The advantages of hydraulic travel time tomography are its high structural resolution and computational efficiency. However, travel times are primarily controlled by the aquifer diffusivity making it difficult to determine hydraulically conductivity and storage. Amplitude inversion on the other hand is able to determine hydraulic conductivity and storage separately, but the heavy computational burden of the amplitude inversion is often a shortcoming, especially for larger data sets. Our coupled inversion approach was developed and tested using synthetic data sets. The data base of the inversion comprises simulated slug tests, in which the position of the sources (injection ports) isolated with packers, are varied between the tests. The first step was the inversion of several characteristic travel times (e.g. early, intermediate and late travel times) in order to determine the diffusivity distribution. Secondly, the resulting diffusivity distributions were classified into homogeneous groups in order to differentiate between hydrogeological units characterized by a significant diffusivity contrast. The classification was performed by using multivariate statistics. With a numerical flow model and an automatic parameter estimator the amplitude inversion was performed in a final step. The classified diffusivity distribution is an excellent starting model for the amplitude inversion and allows to reduce strongly the calculation time. The final amplitude inversion overcomes
Baez-Cazull, S. E.; McGuire, J.T.; Cozzarelli, I.M.; Voytek, M.A.
2008-01-01
Determining the processes governing aqueous biogeochemistry in a wetland hydrologically linked to an underlying contaminated aquifer is challenging due to the complex exchange between the systems and their distinct responses to changes in precipitation, recharge, and biological activities. To evaluate temporal and spatial processes in the wetland-aquifer system, water samples were collected using cm-scale multichambered passive diffusion samplers (peepers) to span the wetland-aquifer interface over a period of 3 yr. Samples were analyzed for major cations and anions, methane, and a suite of organic acids resulting in a large dataset of over 8000 points, which was evaluated using multivariate statistics. Principal component analysis (PCA) was chosen with the purpose of exploring the sources of variation in the dataset to expose related variables and provide insight into the biogeochemical processes that control the water chemistry of the system. Factor scores computed from PCA were mapped by date and depth. Patterns observed suggest that (i) fermentation is the process controlling the greatest variability in the dataset and it peaks in May; (ii) iron and sulfate reduction were the dominant terminal electron-accepting processes in the system and were associated with fermentation but had more complex seasonal variability than fermentation; (iii) methanogenesis was also important and associated with bacterial utilization of minerals as a source of electron acceptors (e.g., barite BaSO4); and (iv) seasonal hydrological patterns (wet and dry periods) control the availability of electron acceptors through the reoxidation of reduced iron-sulfur species enhancing iron and sulfate reduction. Copyright ?? 2008 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.
Stratigraphic Division and Correlation of the Nihewan Beds by Multivariate Statistical Analysis
Institute of Scientific and Technical Information of China (English)
岳军; 蒋明媚
1992-01-01
Described in paper is the principle of optimal partitioning method for stratigraphic division and correlation.The Nihewan Beds are taken for example to show how to apply this approach in stratigraphic division and correlation.The semiquantitative spectral analysis data on aggregate trace elements in 324 samples taken from the nine sections in the Nihewan Basin are treated with multivariate statistical method for stratigraphic division and correlation.First ,the data from all the sections are respectively calculated by the optimal partitioning method to establish the stratigraphic boundaries.The optimal partitioning method has proved itself to be applicable to stratigraphic division and correlation. In our practice the Nihewan Beds are divided into five zones (I-V).Zone I includes subzones Ia and Ib,Zones Ia,Ib,II and III are considered to be corresponding to the Pliocene(N2),the early Early Pleistocene,the late Early Pleistocene,and the Middle Pleistocene,respectively .Zones IV and V are probably Late Pleistocene in age.This indicated that sediments deposited con-temporaneous in the sections of the same basin are similar in geochemical characteristics,although dif-ferent in geographical location.However,the sediments also show some variations ,with a transitional relationship from one section to another .For example ,in Zone II,the sediments of the Xiaodukou section show not only the characteristics of the Nangou-Hongya and Hutouliang sections,but also those of the Xiashagou,Shixiaxi,Shixiadong and Wulitai sections.It can be seen from the above that the zones can be characteristically correlated with one another.In addition the feasibility of the optimal partitioning method is also described in the present paper.
Djorgovski, S. George
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.
Institute of Scientific and Technical Information of China (English)
WANG Pei; ZHANG Dinghua; LI Shan; CHEN Bing
2012-01-01
For aircraft manufacturing industries,the analyses and prediction of part machining error during machining process are very important to control and improve part machining quality.In order to effectively control machining error,the method of integrating multivariate statistical process control (MSPC) and stream of variations (SoV) is proposed.Firstly,machining error is modeled by multi-operation approaches for part machining process.SoV is adopted to establish the mathematic model of the relationship between the error of upstream operations and the error of downstream operations.Here error sources not only include the influence of upstream operations but also include many of other error sources.The standard model and the predicted model about SoV are built respectively by whether the operation is done or not to satisfy different requests during part machining process.Secondly,the method of one-step ahead forecast error (OSFE) is used to eliminate autocorrelativity of the sample data from the SoV model,and the T2 control chart in MSPC is built to realize machining error detection according to the data characteristics of the above error model,which can judge whether the operation is out of control or not.If it is,then feedback is sent to the operations.The error model is modified by adjusting the operation out of control,and continually it is used to monitor operations.Finally,a machining instance containing two operations demonstrates the effectiveness of the machining error control method presented in this paper.
Statistical normalization techniques for magnetic resonance imaging
Directory of Open Access Journals (Sweden)
Russell T. Shinohara
2014-01-01
Full Text Available While computed tomography and other imaging techniques are measured in absolute units with physical meaning, magnetic resonance images are expressed in arbitrary units that are difficult to interpret and differ between study visits and subjects. Much work in the image processing literature on intensity normalization has focused on histogram matching and other histogram mapping techniques, with little emphasis on normalizing images to have biologically interpretable units. Furthermore, there are no formalized principles or goals for the crucial comparability of image intensities within and across subjects. To address this, we propose a set of criteria necessary for the normalization of images. We further propose simple and robust biologically motivated normalization techniques for multisequence brain imaging that have the same interpretation across acquisitions and satisfy the proposed criteria. We compare the performance of different normalization methods in thousands of images of patients with Alzheimer's disease, hundreds of patients with multiple sclerosis, and hundreds of healthy subjects obtained in several different studies at dozens of imaging centers.
Energy Technology Data Exchange (ETDEWEB)
Alves, Luana F.N.; Sarkis, Jorge E.S.; Bordon, Isabela C.A.C., E-mail: ludemar1@hotmail.com, E-mail: jesarkis@ipen.br, E-mail: isabella.bordon@gmail.com [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)
2015-07-01
Analysis of industrial lubricants is widely used for monitoring and predicting maintenance requirements in a broad range of mechanical systems. Laser induced breakdown spectroscopy has been used to evaluate the potentiality of the technique for the determination of metals in lubricating oils. Prior to quantitative analysis, the LIBS system was calibrated using standard samples containing the elements investigated (Cu, Cr, Fe, Pb, Mo and Mg). This study presents the usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets in order to get more information about concentration of metals in oils lubricants is related to engine wear. (author)
Kemperman, Ramses F. J.; Horvatovich, Peter L.; Hoekman, Berend; Reijmers, Theo H.; Muskiet, Frits A. J.; Bischoff, Rainer
2007-01-01
We describe a platform for the comparative profiling of urine using reversed-phase liquid chromatography-mass spectrometry (LC-MS) and multivariate statistical data analysis. Urinary compounds were separated by gradient elution and subsequently detected by electrospray Ion-Trap MS. The lower limit o
Long, C. L.
1991-02-01
Multivariate calibration techniques can reduce the time required for routine testing and can provide new methods of analysis. Multivariate calibration is commonly used with near infrared reflectance analysis (NIRA) and Fourier transform infrared (FTIR) spectroscopy. Two feasibility studies were performed to determine the capability of NIRA, using multivariate calibration techniques, to perform analyses on the types of samples that are routinely analyzed at this laboratory. The first study performed included a variety of samples and indicated that NIRA would be well-suited to perform analyses on selected materials properties such as water content and hydroxyl number on polyol samples, epoxy content on epoxy resins, water content of desiccants, and the amine values of various amine cure agents. A second study was performed to assess the capability of NIRA to perform quantitative analysis of hydroxyl numbers and water contents of hydroxyl-containing materials. Hydroxyl number and water content were selected for determination because these tests are frequently run on polyol materials and the hydroxyl number determination is time consuming. This study pointed out the necessity of obtaining calibration standards identical to the samples being analyzed for each type of polyol or other material being analyzed. Multivariate calibration techniques are frequently used with FTIR data to determine the composition of a large variety of complex mixtures. A literature search indicated many applications of multivariate calibration to FTIR data. Areas identified where quantitation by FTIR would provide a new capability are quantitation of components in epoxy and silicone resins, polychlorinated biphenyls (PCBs) in oils, and additives to polymers.
Chen, Fei; Taylor, William D; Anderson, William B; Huck, Peter M
2013-08-01
This study investigates the suitability of multivariate techniques, including principal component analysis and discriminant function analysis, for analysing polycyclic aromatic hydrocarbon and heavy metal-contaminated aquatic sediment data. We show that multivariate "fingerprint" analysis of relative abundances of contaminants can characterize a contamination source and distinguish contaminated sediments of interest from background contamination. Thereafter, analysis of the unstandardized concentrations among samples contaminated from the same source can identify migration pathways within a study area that is hydraulically complex and has a long contamination history, without reliance on complex hydrodynamic data and modelling techniques. Together, these methods provide an effective tool for drinking water source monitoring and protection.
Directory of Open Access Journals (Sweden)
Said Nawar
2015-01-01
Full Text Available Modeling and mapping of soil properties has been identified as key for effective land degradation management and mitigation. The ability to model and map soil properties at sufficient accuracy for a large agriculture area is demonstrated using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER imagery. Soil samples were collected in the El-Tina Plain, Sinai, Egypt, concurrently with the acquisition of ASTER imagery, and measured for soil electrical conductivity (ECe, clay content and soil organic matter (OM. An ASTER image covering the study area was preprocessed, and two predictive models, multivariate adaptive regression splines (MARS and the partial least squares regression (PLSR, were constructed based on the ASTER spectra. For all three soil properties, the results of MARS models were better than those of the respective PLSR models, with cross-validation estimated R2 of 0.85 and 0.80 for ECe, 0.94 and 0.90 for clay content and 0.79 and 0.73 for OM. Independent validation of ECe, clay content and OM maps with 32 soil samples showed the better performance of the MARS models, with R2 = 0.81, 0.89 and 0.73, respectively, compared to R2 = 0.78, 0.87 and 0.71 for the PLSR models. The results indicated that MARS is a more suitable and superior modeling technique than PLSR for the estimation and mapping of soil salinity (ECe, clay content and OM. The method developed in this paper was found to be reliable and accurate for digital soil mapping in arid and semi-arid environments.
Ebrahimi, Milad; Gerber, Erin L; Rockaway, Thomas D
2017-05-15
For most water treatment plants, a significant number of performance data variables are attained on a time series basis. Due to the interconnectedness of the variables, it is often difficult to assess over-arching trends and quantify operational performance. The objective of this study was to establish simple and reliable predictive models to correlate target variables with specific measured parameters. This study presents a multivariate analysis of the physicochemical parameters of municipal wastewater. Fifteen quality and quantity parameters were analyzed using data recorded from 2010 to 2016. To determine the overall quality condition of raw and treated wastewater, a Wastewater Quality Index (WWQI) was developed. The index summarizes a large amount of measured quality parameters into a single water quality term by considering pre-established quality limitation standards. To identify treatment process performance, the interdependencies between the variables were determined by using Principal Component Analysis (PCA). The five extracted components from the 15 variables accounted for 75.25% of total dataset information and adequately represented the organic, nutrient, oxygen demanding, and ion activity loadings of influent and effluent streams. The study also utilized the model to predict quality parameters such as Biological Oxygen Demand (BOD), Total Phosphorus (TP), and WWQI. High accuracies ranging from 71% to 97% were achieved for fitting the models with the training dataset and relative prediction percentage errors less than 9% were achieved for the testing dataset. The presented techniques and procedures in this paper provide an assessment framework for the wastewater treatment monitoring programs. Copyright © 2017 Elsevier Ltd. All rights reserved.
DEFF Research Database (Denmark)
Kallevik, H.; Hansen, Susanne Brunsgaard; Sæther, Ø.
2000-01-01
Water-in-oil emulsions are investigated by means of multivariate analysis of near infrared (NIR) spectroscopic profiles in the range 1100 - 2250 nm. The oil phase is a paraffin-diluted crude oil from the Norwegian Continental Shelf. The influence of water absorption and light scattering...... of the water droplets are shown to be strong. Despite the strong influence of the water phase, the NIR technique is still capable of predicting the composition of the investigated oil phase....
Comparison of multivariate calibration techniques applied to experimental NIR data sets
Centner, V; Verdu-Andres, J; Walczak, B; Jouan-Rimbaud, D; Despagne, F; Pasti, L; Poppi, R; Massart, DL; de Noord, OE
2000-01-01
The present study compares the performance of different multivariate calibration techniques applied to four near-infrared data sets when test samples are well within the calibration domain. Three types of problems are discussed: the nonlinear calibration, the calibration using heterogeneous data sets, and the calibration in the presence of irrelevant information in the set of predictors. Recommendations are derived from the comparison, which should help to guide a nonchemometrician through th...
Predicting radiotherapy outcomes using statistical learning techniques
Energy Technology Data Exchange (ETDEWEB)
El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O [Washington University, Saint Louis, MO (United States); Lindsay, Patricia E; Hope, Andrew J [Department of Radiation Oncology, Princess Margaret Hospital, Toronto, ON (Canada)
2009-09-21
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among
Predicting radiotherapy outcomes using statistical learning techniques*
El Naqa, Issam; Bradley, Jeffrey D; Lindsay, Patricia E; Hope, Andrew J; Deasy, Joseph O
2013-01-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for ‘generalizabilty’ validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model
Predicting radiotherapy outcomes using statistical learning techniques
El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.
2009-09-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model
Ebqa'ai, Mohammad; Ibrahim, Bashar
2017-03-10
This study aims to analyse the heavy metal pollutants in Jeddah, the second largest city in the Gulf Cooperation Council with a population exceeding 3.5 million, and many vehicles. Ninety-eight street dust samples were collected seasonally from the six major roads as well as the Jeddah Beach, and subsequently digested using modified Leeds Public Analyst method. The heavy metals (Fe, Zn, Mn, Cu, Cd, and Pb) were extracted from the ash using methyl isobutyl ketone as solvent extraction and eventually analysed by atomic absorption spectroscopy. Multivariate statistical techniques, principal component analysis (PCA), and hierarchical cluster analysis were applied to these data. Heavy metal concentrations were ranked according to the following descending order: Fe > Zn > Mn > Cu > Pb > Cd. In order to study the pollution and health risk from these heavy metals as well as estimating their effect on the environment, pollution indices, integrated pollution index, enrichment factor, daily dose average, hazard quotient, and hazard index were all analysed. The PCA showed high levels of Zn, Fe, and Cd in Al Kurnish road, while these elements were consistently detected on King Abdulaziz and Al Madina roads. The study indicates that high levels of Zn and Pb pollution were recorded for major roads in Jeddah. Six out of seven roads had high pollution indices. This study is the first step towards further investigations into current health problems in Jeddah, such as anaemia and asthma.
Banoeng-Yakubo, B.; Yidana, S.M.; Nti, E.
2009-01-01
Q and R-mode multivariate statistical analyses were applied to groundwater chemical data from boreholes and wells in the northern section of the Volta region Ghana. The objective was to determine the processes that affect the hydrochemistry and the variation of these processes in space among the three main geological terrains: the Buem formation, Voltaian System and the Togo series that underlie the area. The analyses revealed three zones in the groundwater flow system: recharge, intermediate and discharge regions. All three zones are clearly different with respect to all the major chemical parameters, with concentrations increasing from the perceived recharge areas through the intermediate regions to the discharge areas. R-mode HCA and factor analysis (using varimax rotation and Kaiser Criterion) were then applied to determine the significant sources of variation in the hydrochemistry. This study finds that groundwater hydrochemistry in the area is controlled by the weathering of silicate and carbonate minerals, as well as the chemistry of infiltrating precipitation. This study finds that the ??D and ??18O data from the area fall along the Global Meteoric Water Line (GMWL). An equation of regression derived for the relationship between ??D and ??18O bears very close semblance to the equation which describes the GMWL. On the basis of this, groundwater in the study area is probably meteoric and fresh. The apparently low salinities and sodicities of the groundwater seem to support this interpretation. The suitability of groundwater for domestic and irrigation purposes is related to its source, which determines its constitution. A plot of the sodium adsorption ratio (SAR) and salinity (EC) data on a semilog axis, suggests that groundwater serves good irrigation quality in the area. Sixty percent (60%), 20% and 20% of the 67 data points used in this study fall within the medium salinity - low sodicity (C2-S1), low salinity -low sodicity (C1-S1) and high salinity - low
Directory of Open Access Journals (Sweden)
Seca Gandaseca
2014-01-01
Full Text Available This study reports the spatio-temporal changes in river and canal water quality of peat swamp forest and oil palm plantation sites of Sarawak, Malaysia. To investigate temporal changes, 192 water samples were collected at four stations of BatangIgan, an oil palm plantation site of Sarawak, during July-November in 2009 and April-July in 2010. Nine water quality parameters including Electrical Conductivity (EC, pH, Turbidity (TER, Dissolved Oxygen (DO, Temperature (TEMP, Chemical Oxygen Demand (COD, five-day Biochemical Oxygen Demand (BOD_{5}, ammonia-Nitrogen (NH_{3}-N, Total Suspended Solids (TSS were analysed. To investigate spatial changes, 432water samples were collected from six different sites including BatangIgan during June-August 2010. Six water quality parameters including pH, DO, COD, BOD_{5}, NH_{3}-N and TSS were analysed to see the spatial variations. Most significant parameters which contributed in spatio-temporal variations were assessed by statistical techniques such as Hierarchical Agglomerative Cluster Analysis (HACA, Factor Analysis/Principal Components Analysis (FA/PCA and Discriminant Function Analysis (DFA. HACA identified three different classes of sites: Relatively Unimpaired, Impaired and Less Impaired Regions on the basis of similarity among different physicochemical characteristics and pollutant level between the sampling sites. DFA produced the best results for identification of main variables for temporal analysis and separated parameters (EC, TER, COD and identified three parameters for spatial analysis (pH, NH_{3}-N and BOD_{5}. The results signify that parameters identified by statistical analyses were responsible for water quality change and suggest the possibility the agricultural and oil palm plantation activities as a source of pollutants. The results suggest dire need for proper watershed management measures to restore the water quality of this tributary for a
Energy Technology Data Exchange (ETDEWEB)
Reichardt, Thomas A.; Timlin, Jerilyn Ann; Jones, Howland D. T.; Sickafoose, Shane M.; Schmitt, Randal L.
2010-09-01
Laser-induced fluorescence measurements of cuvette-contained laser dye mixtures are made for evaluation of multivariate analysis techniques to optically thick environments. Nine mixtures of Coumarin 500 and Rhodamine 610 are analyzed, as well as the pure dyes. For each sample, the cuvette is positioned on a two-axis translation stage to allow the interrogation at different spatial locations, allowing the examination of both primary (absorption of the laser light) and secondary (absorption of the fluorescence) inner filter effects. In addition to these expected inner filter effects, we find evidence that a portion of the absorbed fluorescence is re-emitted. A total of 688 spectra are acquired for the evaluation of multivariate analysis approaches to account for nonlinear effects.
Méndez, Jesús; González, Mónica; Lobo, M Gloria; Carnero, Aurelio
2004-03-10
The commercial value of a cochineal (Dactylopius coccus Costa) sample is associated with its color quality. Because the cochineal is a legal food colorant, its color quality is generally understood as its pigment content. Simply put, the higher this content, the more valuable the sample is to the market. In an effort to devise a way to measure the color quality of a cochineal, the present study evaluates different parameters of color measurement such as chromatic attributes (L*, and a*), percentage of carminic acid, tint determination, and chromatographic profile of pigments. Tint determination did not achieve this objective because this parameter does not correlate with carminic acid content. On the other hand, carminic acid showed a highly significant correlation (r = - 0.922, p = 0.000) with L* values determined from powdered cochineal samples. The combination of the information from the spectrophotometric determination of carminic acid with that of the pigment profile acquired by liquid chromatography (LC) and the composition of the red and yellow pigment groups, also acquired by LC, enables greater accuracy in judging the quality of the final sample. As a result of this study, it was possible to achieve the separation of cochineal samples according to geographical origin using two statistical techniques: cluster analysis and principal component analysis.
Chemical indices and methods of multivariate statistics as a tool for odor classification.
Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd
2007-04-01
Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.
Poucheret, Patrick; Fons, Françoise; Doré, Jean Christophe; Michelot, Didier; Rapior, Sylvie
2010-06-15
Ninety percent of fatal higher fungus poisoning is due to amatoxin-containing mushroom species. In addition to absence of antidote, no chemotherapeutic consensus was reported. The aim of the present study is to perform a retrospective multidimensional multivariate statistic analysis of 2110 amatoxin poisoning clinical cases, in order to optimize therapeutic decision-making. Our results allowed to classify drugs as a function of their influence on one major parameter: patient survival. Active principles were classified as first intention, second intention, adjuvant or controversial pharmaco-therapeutic clinical intervention. We conclude that (1) retrospective multidimensional multivariate statistic analysis of complex clinical dataset might help future therapeutic decision-making and (2) drugs such as silybin, N-acetylcystein and putatively ceftazidime are clearly associated, in amatoxin poisoning context, with higher level of patient survival.
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Multivariate reference technique for quantitative analysis of fiber-optic tissue Raman spectroscopy.
Bergholt, Mads Sylvest; Duraipandian, Shiyamala; Zheng, Wei; Huang, Zhiwei
2013-12-03
We report a novel method making use of multivariate reference signals of fused silica and sapphire Raman signals generated from a ball-lens fiber-optic Raman probe for quantitative analysis of in vivo tissue Raman measurements in real time. Partial least-squares (PLS) regression modeling is applied to extract the characteristic internal reference Raman signals (e.g., shoulder of the prominent fused silica boson peak (~130 cm(-1)); distinct sapphire ball-lens peaks (380, 417, 646, and 751 cm(-1))) from the ball-lens fiber-optic Raman probe for quantitative analysis of fiber-optic Raman spectroscopy. To evaluate the analytical value of this novel multivariate reference technique, a rapid Raman spectroscopy system coupled with a ball-lens fiber-optic Raman probe is used for in vivo oral tissue Raman measurements (n = 25 subjects) under 785 nm laser excitation powers ranging from 5 to 65 mW. An accurate linear relationship (R(2) = 0.981) with a root-mean-square error of cross validation (RMSECV) of 2.5 mW can be obtained for predicting the laser excitation power changes based on a leave-one-subject-out cross-validation, which is superior to the normal univariate reference method (RMSE = 6.2 mW). A root-mean-square error of prediction (RMSEP) of 2.4 mW (R(2) = 0.985) can also be achieved for laser power prediction in real time when we applied the multivariate method independently on the five new subjects (n = 166 spectra). We further apply the multivariate reference technique for quantitative analysis of gelatin tissue phantoms that gives rise to an RMSEP of ~2.0% (R(2) = 0.998) independent of laser excitation power variations. This work demonstrates that multivariate reference technique can be advantageously used to monitor and correct the variations of laser excitation power and fiber coupling efficiency in situ for standardizing the tissue Raman intensity to realize quantitative analysis of tissue Raman measurements in vivo, which is particularly appealing in
DEFF Research Database (Denmark)
Hansen, Michael Adsetts Edberg
Interest in statistical methodology is increasing so rapidly in the astronomical community that accessible introductory material in this area is long overdue. This book fills the gap by providing a presentation of the most useful techniques in multivariate statistics. A wide-ranging annotated set...
DEFF Research Database (Denmark)
Hansen, Michael Adsetts Edberg
Interest in statistical methodology is increasing so rapidly in the astronomical community that accessible introductory material in this area is long overdue. This book fills the gap by providing a presentation of the most useful techniques in multivariate statistics. A wide-ranging annotated set...
Bakraji, E. H.; Rihawy, M. S.; Castel, C.; Abboud, R.
2015-03-01
Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year.
Directory of Open Access Journals (Sweden)
Đula Borozan
2014-03-01
Full Text Available The paper deals with the application of multivariate analysis of variance and logistic regression in measuring, explaining and evaluating (i gender differences in expressing migration aspirations, and (ii a gender effect on migration motivation of university students in Croatia. The results supported the thesis that migration is a complex gendering process that assumes subjective assessment of the whole set of interrelated motives. According to logistic regression, gender is a significant predictor of migration aspirations among the selected demographic and socio-economic variables. A multivariate analysis of variance showed that gender and migration aspirations in interaction matter when it comes to migration motives, particularly related to the perceived importance of social networks. Females, and especially those who aspire to migrate, assessed these motives as more important than males.
Directory of Open Access Journals (Sweden)
Michel J. Anzanello
2014-09-01
Full Text Available A typical application of multivariate techniques in forensic analysis consists of discriminating between authentic and unauthentic samples of seized drugs, in addition to finding similar properties in the unauthentic samples. In this paper, the performance of several methods belonging to two different classes of multivariate techniques–supervised and unsupervised techniques–were compared. The supervised techniques (ST are the k-Nearest Neighbor (KNN, Support Vector Machine (SVM, Probabilistic Neural Networks (PNN and Linear Discriminant Analysis (LDA; the unsupervised techniques are the k-Means CA and the Fuzzy C-Means (FCM. The methods are applied to Infrared Spectroscopy by Fourier Transform (FTIR from authentic and unauthentic Cialis and Viagra. The FTIR data are also transformed by Principal Components Analysis (PCA and kernel functions aimed at improving the grouping performance. ST proved to be a more reasonable choice when the analysis is conducted on the original data, while the UT led to better results when applied to transformed data.
Directory of Open Access Journals (Sweden)
Ewelina Dziurkowska
2015-01-01
Full Text Available Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer’s and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA and principal component analysis (PCA, to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients’ age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods.
Dziurkowska, Ewelina; Wesolowski, Marek
2015-01-01
Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer's and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA) and principal component analysis (PCA), to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients' age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians) characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods.
Directory of Open Access Journals (Sweden)
Weili Duan
2016-01-01
Full Text Available Multivariate statistical methods including cluster analysis (CA, discriminant analysis (DA and component analysis/factor analysis (PCA/FA, were applied to explore the surface water quality datasets including 14 parameters at 28 sites of the Eastern Poyang Lake Basin, Jiangxi Province of China, from January 2012 to April 2015, characterize spatiotemporal variation in pollution and identify potential pollution sources. The 28 sampling stations were divided into two periods (wet season and dry season and two regions (low pollution and high pollution, respectively, using hierarchical CA method. Four parameters (temperature, pH, ammonia-nitrogen (NH4-N, and total nitrogen (TN were identified using DA to distinguish temporal groups with close to 97.86% correct assignations. Again using DA, five parameters (pH, chemical oxygen demand (COD, TN, Fluoride (F, and Sulphide (S led to 93.75% correct assignations for distinguishing spatial groups. Five potential pollution sources including nutrients pollution, oxygen consuming organic pollution, fluorine chemical pollution, heavy metals pollution and natural pollution, were identified using PCA/FA techniques for both the low pollution region and the high pollution region. Heavy metals (Cuprum (Cu, chromium (Cr and Zinc (Zn, fluoride and sulfide are of particular concern in the study region because of many open-pit copper mines such as Dexing Copper Mine. Results obtained from this study offer a reasonable classification scheme for low-cost monitoring networks. The results also inform understanding of spatio-temporal variation in water quality as these topics relate to water resources management.
Keita, Souleymane; Zhonghua, Tang
2017-10-01
Sustainable management of groundwater resources is a major issue for developing countries, especially in Mali. The multiple uses of groundwater led countries to promote sound management policies for sustainable use of the groundwater resources. For this reason, each country needs data enabling it to monitor and predict the changes of the resources. Also given the importance of groundwater quality changes often marked by the recurrence of droughts; the potential impacts of regional and geological setting of groundwater resources requires careful study. Unfortunately, recent decades have seen a considerable reduction of national capacities to ensure the hydrogeological monitoring and production of qualit data for decision making. The purpose of this work is to use the groundwater data and translate into useful information that can improve water resources management capacity in Mali. In this paper, we used groundwater analytical data from accredited, laboratories in Mali to carry out a national scale assessment of the groundwater types and their distribution. We, adapted multivariate statistical methods to classify 2035 groundwater samples into seven main groundwater types and built a national scale map from the results. We used a two-level K-mean clustering technique to examine the hydro-geochemical records as percentages of the total concentrations of major ions, namely sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO3), and sulphate (SO4). The first step of clustering formed 20 groups, and these groups were then re-clustered to produce the final seven groundwater types. The results were verified and confirmed using Principal Component Analysis (PCA) and RockWare (Aq.QA) software. We found that HCO3 was the most dominant anion throughout the country and that Cl and SO4 were only important in some local zones. The dominant cations were Na and Mg. Also, major ion ratios changed with geographical location and geological, and climatic
The Importance of Introductory Statistics Students Understanding Appropriate Sampling Techniques
Menil, Violeta C.
2005-01-01
In this paper the author discusses the meaning of sampling, the reasons for sampling, the Central Limit Theorem, and the different techniques of sampling. Practical and relevant examples are given to make the appropriate sampling techniques understandable to students of Introductory Statistics courses. With a thorough knowledge of sampling…
Fuchs, Julia; Cermak, Jan; Andersen, Hendrik
2017-04-01
This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.
Energy Technology Data Exchange (ETDEWEB)
Clegg, Samuel M [Los Alamos National Laboratory; Barefield, James E [Los Alamos National Laboratory; Wiens, Roger C [Los Alamos National Laboratory; Sklute, Elizabeth [MT HOLYOKE COLLEGE; Dyare, Melinda D [MT HOLYOKE COLLEGE
2008-01-01
Quantitative analysis with LIBS traditionally employs calibration curves that are complicated by the chemical matrix effects. These chemical matrix effects influence the LIBS plasma and the ratio of elemental composition to elemental emission line intensity. Consequently, LIBS calibration typically requires a priori knowledge of the unknown, in order for a series of calibration standards similar to the unknown to be employed. In this paper, three new Multivariate Analysis (MV A) techniques are employed to analyze the LIBS spectra of 18 disparate igneous and highly-metamorphosed rock samples. Partial Least Squares (PLS) analysis is used to generate a calibration model from which unknown samples can be analyzed. Principal Components Analysis (PCA) and Soft Independent Modeling of Class Analogy (SIMCA) are employed to generate a model and predict the rock type of the samples. These MV A techniques appear to exploit the matrix effects associated with the chemistries of these 18 samples.
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
Genetic divergence of rubber tree estimated by multivariate techniques and microsatellite markers
Directory of Open Access Journals (Sweden)
Lígia Regina Lima Gouvêa
2010-01-01
Full Text Available Genetic diversity of 60 Hevea genotypes, consisting of Asiatic, Amazonian, African and IAC clones, and pertaining to the genetic breeding program of the Agronomic Institute (IAC, Brazil, was estimated. Analyses were based on phenotypic multivariate parameters and microsatellites. Five agronomic descriptors were employed in multivariate procedures, such as Standard Euclidian Distance, Tocher clustering and principal component analysis. Genetic variability among the genotypes was estimated with 68 selected polymorphic SSRs, by way of Modified Rogers Genetic Distance and UPGMA clustering. Structure software in a Bayesian approach was used in discriminating among groups. Genetic diversity was estimated through Nei's statistics. The genotypes were clustered into 12 groups according to the Tocher method, while the molecular analysis identified six groups. In the phenotypic and microsatellite analyses, the Amazonian and IAC genotypes were distributed in several groups, whereas the Asiatic were in only a few. Observed heterozygosity ranged from 0.05 to 0.96. Both high total diversity (H T' = 0.58 and high gene differentiation (Gst' = 0.61 were observed, and indicated high genetic variation among the 60 genotypes, which may be useful for breeding programs. The analyzed agronomic parameters and SSRs markers were effective in assessing genetic diversity among Hevea genotypes, besides proving to be useful for characterizing genetic variability.
Publishing nutrition research: a review of multivariate techniques--part 3: data reduction methods.
Gleason, Philip M; Boushey, Carol J; Harris, Jeffrey E; Zoellner, Jamie
2015-07-01
This is the ninth in a series of monographs on research design and analysis, and the third in a set of these monographs devoted to multivariate methods. The purpose of this article is to provide an overview of data reduction methods, including principal components analysis, factor analysis, reduced rank regression, and cluster analysis. In the field of nutrition, data reduction methods can be used for three general purposes: for descriptive analysis in which large sets of variables are efficiently summarized, to create variables to be used in subsequent analysis and hypothesis testing, and in questionnaire development. The article describes the situations in which these data reduction methods can be most useful, briefly describes how the underlying statistical analyses are performed, and summarizes how the results of these data reduction methods should be interpreted.
Valle, Denis; Baiser, Benjamin; Woodall, Christopher W; Chazdon, Robin
2014-12-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates of uncertainty. We illustrate our method using tree data for the eastern United States and from a tropical successional chronosequence. The model is able to detect pervasive declines in the oak community in Minnesota and Indiana, potentially due to fire suppression, increased growing season precipitation and herbivory. The chronosequence analysis is able to delineate clear successional trends in species composition, while also revealing that site-specific factors significantly impact these successional trajectories. The proposed method provides a means to decompose and track the dynamics of species assemblages along temporal and spatial gradients, including effects of global change and forest disturbances.
Multivariate statistical analysis of stream-sediment geochemistry in the Grazer Paläozoikum, Austria
Weber, L.; Davis, J.C.
1990-01-01
The Austrian reconnaissance study of stream-sediment composition — more than 30000 clay-fraction samples collected over an area of 40000 km2 — is summarized in an atlas of regional maps that show the distributions of 35 elements. These maps, rich in information, reveal complicated patterns of element abundance that are difficult to compare on more than a small number of maps at one time. In such a study, multivariate procedures such as simultaneous R-Q mode components analysis may be helpful. They can compress a large number of variables into a much smaller number of independent linear combinations. These composite variables may be mapped and relationships sought between them and geological properties. As an example, R-Q mode components analysis is applied here to the Grazer Paläozoikum, a tectonic unit northeast of the city of Graz, which is composed of diverse lithologies and contains many mineral deposits.
Xu, Qinzeng; Gao, Fei; Xu, Qiang; Yang, Hongsheng
2014-11-01
Fatty acids (FAs) provide energy and also can be used to trace trophic relationships among organisms. Sea cucumber Apostichopus japonicus goes into a state of aestivation during warm summer months. We examined fatty acid profiles in aestivated and non-aestivated A. japonicus using multivariate analyses (PERMANOVA, MDS, ANOSIM, and SIMPER). The results indicate that the fatty acid profiles of aestivated and non-aestivated sea cucumbers differed significantly. The FAs that were produced by bacteria and brown kelp contributed the most to the differences in the fatty acid composition of aestivated and nonaestivated sea cucumbers. Aestivated sea cucumbers may synthesize FAs from heterotrophic bacteria during early aestivation, and long chain FAs such as eicosapentaenoic (EPA) and docosahexaenoic acid (DHA) that produced from intestinal degradation, are digested during deep aestivation. Specific changes in the fatty acid composition of A. japonicus during aestivation needs more detailed study in the future.
Montecarlo Techniques as a tool for teaching statistics
Bueno, FM Alexander
2014-01-01
Probability Theory and Statistics are two of the most useful mathematical fields, and also two of the most difficult to learn. In other science fields, as Physics, experimentation is an useful tool to develop students intuition, but the application of this tool to Statistics is much more difficult. In this paper we show how Monte Carlo techniques can be used to perform numerical experiments, by the use of pseudorandom numbers, and how these experiments can help to the understanding of Statistics and Physics. Monte Carlo techniques are broadly used in scientific research, but they are learnt usually in very specific curses of higher education. By the use of computer simulation these techniques can also be taught at elementary school and they can help to understand and visualise concepts as variance, mean or a probability distribution function. Finally, the use of new technologies, as Javascript and HTML is discussed.
Thriumani, Reena; Zakaria, Ammar; Hashim, Yumi Zuhanis Has-Yun; Helmy, Khaled Mohamed; Omar, Mohammad Iqbal; Jeffree, Amanina; Adom, Abdul Hamid; Shakaff, Ali Yeon Md; Kamarudin, Latifah Munirah
2017-03-01
In this experiment, three different cell cultures (A549, WI38VA13 and MCF7) and blank medium (without cells) as a control were used. The electronic nose (E-Nose) was used to sniff the headspace of cultured cells and the data were recorded. After data pre-processing, two different features were extracted by taking into consideration of both steady state and the transient information. The extracted data are then being processed by multivariate analysis, Linear Discriminant Analysis (LDA) to provide visualization of the clustering vector information in multi-sensor space. The Probabilistic Neural Network (PNN) classifier was used to test the performance of the E-Nose on determining the volatile organic compounds (VOCs) of lung cancer cell line. The LDA data projection was able to differentiate between the lung cancer cell samples and other samples (breast cancer, normal cell and blank medium) effectively. The features extracted from the steady state response reached 100% of classification rate while the transient response with the aid of LDA dimension reduction methods produced 100% classification performance using PNN classifier with a spread value of 0.1. The results also show that E-Nose application is a promising technique to be applied to real patients in further work and the aid of Multivariate Analysis; it is able to be the alternative to the current lung cancer diagnostic methods.
Rogachov, A; Cheng, J C; DeSouza, D D
2015-11-01
Overlapping functional magnetic resonance imaging (fMRI) activity elicited by physical pain and social rejection has posited a common neural representation between the two experiences. However, Woo and colleagues (Nat Commun 5: 5380, 2014) recently used multivariate statistics to challenge the "shared representation" theory of pain. This study has implications in the way results from fMRI studies are interpreted and has the potential of broadening our understanding of different pain states and future development of personalized medicine. Copyright © 2015 the American Physiological Society.
Directory of Open Access Journals (Sweden)
Yeon-Jee Yoo
2013-08-01
Full Text Available Objectives The aim of this study was to compare the cleaning efficacy of different final irrigation regimens in canal and isthmus of mandibular molars, and to evaluate the influence of related variables on cleaning efficacy of the irrigation systems. Materials and Methods Mesial root canals from 60 mandibular molars were prepared and divided into 4 experimental groups according to the final irrigation technique: Group C, syringe irrigation; Group U, ultrasonics activation; Group SC, VPro StreamClean irrigation; Group EV, EndoVac irrigation. Cross-sections at 1, 3 and 5 mm levels from the apex were examined to calculate remaining debris area in the canal and isthmus spaces. Statistical analysis was completed by using Kruskal-Wallis test and Mann-Whitney U test for comparison among groups, and multivariate linear analysis to identify the significant variables (regular replenishment of irrigant, vapor lock management, and ultrasonic activation of irrigant affecting the cleaning efficacy of the experimental groups. Results Group SC and EV showed significantly higher canal cleanliness values than group C and U at 1 mm level (p < 0.05, and higher isthmus cleanliness values than group U at 3 mm and all levels of group C (p < 0.05. Multivariate linear regression analysis demonstrated that all variables had independent positive correlation at 1 mm level of canal and at all levels of isthmus with statistical significances. Conclusions Both VPro StreamClean and EndoVac system showed favorable result as final irrigation regimens for cleaning debris in the complicated root canal system having curved canal and/or isthmus. The debridement of the isthmi significantly depends on the variables rather than the canals.
Statistical principles and techniques in scientific and social research
Krzanowski, Wojtek J
2007-01-01
This text provides a clear discussion of the basic statistical concepts and methods frequently encountered in statistical research. Assuming only a basic level of Mathematics, and with numerous examples and illustrations, this text is a valuable resource for students and researchers in the Sciences and Social Sciences. - ;This graduate-level text provides a survey of the logic and reasoning underpinning statistical analysis, as well as giving a broad-brush overview of the various statistical techniques that play a major role in scientific and social investigations. Arranged in rough historical order, the text starts with the ideas of probability that underpin statistical methods and progresses through the developments of the nineteenth and twentieth centuries to modern concerns and solutions. Assuming only a basic level of Mathematics and with numerous examples and illustrations, this text presents a valuable resource not only to the experienced researcher but also to the student, by complementing courses in ...
Detecting seasonal cycle shift on streamflow over Turkey by using multivariate statistical methods
Yildiz, Dogan; Gunes, Mehmet Samil; Gokalp Yavuz, Fulya; Yildiz, Dursun
2017-08-01
Climate change analysis includes the study of several types of variables such as temperature, precipitation, carbon emission, and streamflow. In this study, we focus on basin hydrology and, in particular, on streamflow values. They are geographic and climatologic indicators utilized in the study of basins. We analyze these values to better understand monthly and seasonal change over a 40-year period for all basins in Turkey. Our study differs from others by applying multivariate analysis into the streamflow data implementations rather than on trend, frequency, and/or distribution-based analysis. The characteristics of basins and climate change effects are visualized and examined with monthly data by using cluster analysis, multidimensional scaling, and gCLUTO (graphical Clustering Toolkit). As a result, we classify months as low-flow and high-flow periods. Multidimensional scaling proves that there is a clockwise movement of months from one decade to the next, which is the indicator of seasonal shift. Finally, the gCLUTO tool is utilized in a novel way in the hydrology field by revealing the seasonal change and visualizing the current changing structure of streamflow.
Advanced Statistical Signal Processing Techniques for Landmine Detection Using GPR
2014-07-12
Processing Techniques for Landmine Detection Using GPR The views, opinions and/or findings contained in this report are those of the author(s) and should not...AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 landmine Detection, Signal...310 Jesse Hall Columbia, MO 65211 -1230 654808 633606 ABSTRACT Advanced Statistical Signal Processing Techniques for Landmine Detection Using GPR Report
Nnane, Daniel Ekane
2011-11-15
Contamination of surface waters is a pervasive threat to human health, hence, the need to better understand the sources and spatio-temporal variations of contaminants within river catchments. River catchment managers are required to sustainably monitor and manage the quality of surface waters. Catchment managers therefore need cost-effective low-cost long-term sustainable water quality monitoring and management designs to proactively protect public health and aquatic ecosystems. Multivariate and phage-lysis techniques were used to investigate spatio-temporal variations of water quality, main polluting chemophysical and microbial parameters, faecal micro-organisms sources, and to establish 'sentry' sampling sites in the Ouse River catchment, southeast England, UK. 350 river water samples were analysed for fourteen chemophysical and microbial water quality parameters in conjunction with the novel human-specific phages of Bacteroides GB-124 (Bacteroides GB-124). Annual, autumn, spring, summer, and winter principal components (PCs) explained approximately 54%, 75%, 62%, 48%, and 60%, respectively, of the total variance present in the datasets. Significant loadings of Escherichia coli, intestinal enterococci, turbidity, and human-specific Bacteroides GB-124 were observed in all datasets. Cluster analysis successfully grouped sampling sites into five clusters. Importantly, multivariate and phage-lysis techniques were useful in determining the sources and spatial extent of water contamination in the catchment. Though human faecal contamination was significant during dry periods, the main source of contamination was non-human. Bacteroides GB-124 could potentially be used for catchment routine microbial water quality monitoring. For a cost-effective low-cost long-term sustainable water quality monitoring design, E. coli or intestinal enterococci, turbidity, and Bacteroides GB-124 should be monitored all-year round in this river catchment. Copyright © 2011 Elsevier B.V. All
Hamaker, Ellen L.; Dolan, Conor V.; Molenaar, Peter C. M.
2005-01-01
Results obtained with interindividual techniques in a representative sample of a population are not necessarily generalizable to the individual members of this population. In this article the specific condition is presented that must be satisfied to generalize from the interindividual level to the intraindividual level. A way to investigate…
A method of using cluster analysis to study statistical dependence in multivariate data
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
Dhat, Shalaka; Pund, Swati; Kokare, Chandrakant; Sharma, Pankaj; Shrivastava, Birendra
2017-01-01
Rapidly evolving technical and regulatory landscapes of the pharmaceutical product development necessitates risk management with application of multivariate analysis using Process Analytical Technology (PAT) and Quality by Design (QbD). Poorly soluble, high dose drug, Satranidazole was optimally nanoprecipitated (SAT-NP) employing principles of Formulation by Design (FbD). The potential risk factors influencing the critical quality attributes (CQA) of SAT-NP were identified using Ishikawa diagram. Plackett-Burman screening design was adopted to screen the eight critical formulation and process parameters influencing the mean particle size, zeta potential and dissolution efficiency at 30min in pH7.4 dissolution medium. Pareto charts (individual and cumulative) revealed three most critical factors influencing CQA of SAT-NP viz. aqueous stabilizer (Polyvinyl alcohol), release modifier (Eudragit® S 100) and volume of aqueous phase. The levels of these three critical formulation attributes were optimized by FbD within established design space to minimize mean particle size, poly dispersity index, and maximize encapsulation efficiency of SAT-NP. Lenth's and Bayesian analysis along with mathematical modeling of results allowed identification and quantification of critical formulation attributes significantly active on the selected CQAs. The optimized SAT-NP exhibited mean particle size; 216nm, polydispersity index; 0.250, zeta potential; -3.75mV and encapsulation efficiency; 78.3%. The product was lyophilized using mannitol to form readily redispersible powder. X-ray diffraction analysis confirmed the conversion of crystalline SAT to amorphous form. In vitro release of SAT-NP in gradually pH changing media showed 95%) in pH7.4 in next 3h, indicative of burst release after a lag time. This investigation demonstrated effective application of risk management and QbD tools in developing site-specific release SAT-NP by nanoprecipitation.
Mfumu Kihumba, Antoine; Vanclooster, Marnik
2013-04-01
Drinking water in Kinshasa, the capital of the Democratic Republic of Congo, is provided by extracting groundwater from the local aquifer, particularly in peripheral areas. The exploited groundwater body is mainly unconfined and located within a continuous detrital aquifer, primarily composed of sedimentary formations. However, the aquifer is subjected to an increasing threat of anthropogenic pollution pressure. Understanding the detailed origin of this pollution pressure is important for sustainable drinking water management in Kinshasa. The present study aims to explain the observed nitrate pollution problem, nitrate being considered as a good tracer for other pollution threats. The analysis is made in terms of physical attributes that are readily available using a statistical modelling approach. For the nitrate data, use was made of a historical groundwater quality assessment study, for which the data were re-analysed. The physical attributes are related to the topography, land use, geology and hydrogeology of the region. Prior to the statistical modelling, intrinsic and specific vulnerability for nitrate pollution was assessed. This vulnerability assessment showed that the alluvium area in the northern part of the region is the most vulnerable area. This area consists of urban land use with poor sanitation. Re-analysis of the nitrate pollution data demonstrated that the spatial variability of nitrate concentrations in the groundwater body is high, and coherent with the fragmented land use of the region and the intrinsic and specific vulnerability maps. For the statistical modeling use was made of multiple regression and regression tree analysis. The results demonstrated the significant impact of land use variables on the Kinshasa groundwater nitrate pollution and the need for a detailed delineation of groundwater capture zones around the monitoring stations. Key words: Groundwater , Isotopic, Kinshasa, Modelling, Pollution, Physico-chemical.
Bevacqua, Emanuele; Maraun, Douglas; Hobæk Haff, Ingrid; Widmann, Martin; Vrac, Mathieu
2017-06-01
Compound events (CEs) are multivariate extreme events in which the individual contributing variables may not be extreme themselves, but their joint - dependent - occurrence causes an extreme impact. Conventional univariate statistical analysis cannot give accurate information regarding the multivariate nature of these events. We develop a conceptual model, implemented via pair-copula constructions, which allows for the quantification of the risk associated with compound events in present-day and future climate, as well as the uncertainty estimates around such risk. The model includes predictors, which could represent for instance meteorological processes that provide insight into both the involved physical mechanisms and the temporal variability of compound events. Moreover, this model enables multivariate statistical downscaling of compound events. Downscaling is required to extend the compound events' risk assessment to the past or future climate, where climate models either do not simulate realistic values of the local variables driving the events or do not simulate them at all. Based on the developed model, we study compound floods, i.e. joint storm surge and high river runoff, in Ravenna (Italy). To explicitly quantify the risk, we define the impact of compound floods as a function of sea and river levels. We use meteorological predictors to extend the analysis to the past, and get a more robust risk analysis. We quantify the uncertainties of the risk analysis, observing that they are very large due to the shortness of the available data, though this may also be the case in other studies where they have not been estimated. Ignoring the dependence between sea and river levels would result in an underestimation of risk; in particular, the expected return period of the highest compound flood observed increases from about 20 to 32 years when switching from the dependent to the independent case.
Directory of Open Access Journals (Sweden)
E. Bevacqua
2017-06-01
Full Text Available Compound events (CEs are multivariate extreme events in which the individual contributing variables may not be extreme themselves, but their joint – dependent – occurrence causes an extreme impact. Conventional univariate statistical analysis cannot give accurate information regarding the multivariate nature of these events. We develop a conceptual model, implemented via pair-copula constructions, which allows for the quantification of the risk associated with compound events in present-day and future climate, as well as the uncertainty estimates around such risk. The model includes predictors, which could represent for instance meteorological processes that provide insight into both the involved physical mechanisms and the temporal variability of compound events. Moreover, this model enables multivariate statistical downscaling of compound events. Downscaling is required to extend the compound events' risk assessment to the past or future climate, where climate models either do not simulate realistic values of the local variables driving the events or do not simulate them at all. Based on the developed model, we study compound floods, i.e. joint storm surge and high river runoff, in Ravenna (Italy. To explicitly quantify the risk, we define the impact of compound floods as a function of sea and river levels. We use meteorological predictors to extend the analysis to the past, and get a more robust risk analysis. We quantify the uncertainties of the risk analysis, observing that they are very large due to the shortness of the available data, though this may also be the case in other studies where they have not been estimated. Ignoring the dependence between sea and river levels would result in an underestimation of risk; in particular, the expected return period of the highest compound flood observed increases from about 20 to 32 years when switching from the dependent to the independent case.
Validation of Models : Statistical Techniques and Data Availability
Kleijnen, J.P.C.
1999-01-01
This paper shows which statistical techniques can be used to validate simulation models, depending on which real-life data are available. Concerning this availability three situations are distinguished (i) no data, (ii) only output data, and (iii) both input and output data. In case (i) - no real
Statistical Techniques for Efficient Indexing and Retrieval of Document Images
Bhardwaj, Anurag
2010-01-01
We have developed statistical techniques to improve the performance of document image search systems where the intermediate step of OCR based transcription is not used. Previous research in this area has largely focused on challenges pertaining to generation of small lexicons for processing handwritten documents and enhancement of poor quality…
Institute of Scientific and Technical Information of China (English)
Nur Hazirah Adnan; Mohamad Pauzi Zakaria; Hafizan Juahir; Masni Mohd Ali
2012-01-01
The Langat River in Malaysia has been experiencing anthropogenic input from urban,rural and industrial activities for many years.Sewage contamination,possibly originating from the greater than three million inhabitants of the Langat River Basin,were examined.Sediment samples from 22 stations (SL01-SL22) along the Langat River were collected,extracted and analysed by GC-MS.Six different sterols were identified and quantified.The highest sterol concentration was found at station SL02 (618.29 ng/g dry weight),which situated in the Balak River whereas the other sediment samples ranged between 11.60 and 446.52 ng/g dry weight.Sterol ratios were used to identify sources,occurrence and partitioning of faecal matter in sediments and majority of the ratios clearly demonstrated that sewage contamination was occurring at most stations in the Langat River.A multivariate statistical analysis was used in conjunction with a combination of biomarkers to better understand the data that clearly separated the compounds.Most sediments of the Langat River were found to contain low to mid-range sewage contamination with some containing ‘significant' levels of contamination.This is the first report on sewage pollution in the Langat River based on a combination of biomarker and multivariate statistical approaches that will establish a new standard for sewage detection using faecal sterols.
Reidy, Lorlyn; Bu, Kaixuan; Godfrey, Murrell; Cizdziel, James V
2013-12-10
Students in an instrumental analysis course with a forensic emphasis were presented with a mock scenario in which soil was collected from a murder suspect's car mat, from the crime scene, from adjacent areas, and from more distant locations. Students were then asked to conduct a comparative analysis using the soil's elemental distribution fingerprints. The soil was collected from Lafayette County, Mississippi, USA and categorized as sandy loam. Eight student groups determined twenty-two elements (Li, Be, Mg, Al, K, Ca, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Rb, Sr, Cs, Ba, Pb, U) in seven samples of soil and one sample of sediment by microwave-assisted acid digestion and inductively coupled plasma-mass spectrometry (ICP-MS). Data were combined and evaluated using multivariate statistical analyses. All eight student groups correctly classified their unknown among the different locations. Students learn, however, that whereas their results suggest that the elemental fingerprinting approach can be used to distinguish soils from different land-use areas and geographic locations, applying the methodology in forensic investigations is more complicated and has potential pitfalls. Overall, the inquiry-based pedagogy enthused the students and provided learning opportunities in analytical chemistry, including sample preparation, ICP-MS, figures-of-merit, and multivariate statistics. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Donges, Jonathan F; Loew, Alexander; Marwan, Norbert; Kurths, Jürgen
2013-01-01
Eigen techniques such as empirical orthogonal function (EOF) or coupled pattern (CP) analysis have been frequently used for detecting patterns in multivariate climatological data sets. Recently, statistical methods originating from the theory of complex networks have been employed for the very same purpose of spatio-temporal analysis. This climate network analysis is usually based on the same set of similarity matrices as is used in classical EOF or CP analysis, e.g., the correlation matrix of a single climatological field or the cross-correlation matrix between two distinct climatological fields. In this study, formal relationships between both eigen and network approaches are derived and illustrated using exemplary data sets. These results allow to pinpoint that climate network analysis can complement classical eigen techniques and provides substantial additional information on the higher-order structure of statistical interrelationships in climatological data sets. Hence, climate networks are a valuable su...
Lapuyade-Lahorgue, Jerome; Xue, Jing-Hao; Ruan, Su
2017-03-21
Nowadays, multi-source image acquisition attracts an increasing interest in many fields such as multi-modal medical image segmentation. Such acquisition aims at considering complementary information to perform image segmentation since the same scene has been observed by various types of images. However, strong dependency often exists between multi-source images. This dependency should be taken into account when we try to extract joint information for precisely making a decision. In order to statistically model this dependency between multiple sources, we propose a novel multi-source fusion method based on the Gaussian copula. The proposed fusion model is integrated in a statistical framework with the hidden Markov field inference in order to delineate a target volume from multi-source images. Estimation of parameters of the models and segmentation of the images are jointly performed by an iterative algorithm based on Gibbs sampling. Experiments are performed on multi-sequence MRI to segment tumors. The results show that the proposed method based on the Gaussian copula is effective to accomplish multi-source image segmentation.
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a hallucinogenic herb that is internationally regulated. In this study, salvinorin A, the active compound in S. divinorum, was extracted from S. divinorum plant leaves using a 5-min extraction with dichloromethane. Four additional Salvia species (Salvia officinalis, Salvia guaranitica, Salvia splendens, and Salvia nemorosa) were extracted using this procedure, and all extracts were analyzed by gas chromatography-mass spectrometry. Differentiation of S. divinorum from other Salvia species was successful based on visual assessment of the resulting chromatograms. To provide a more objective comparison, the total ion chromatograms (TICs) were subjected to principal components analysis (PCA). Prior to PCA, the TICs were subjected to a series of data pretreatment procedures to minimize non-chemical sources of variance in the data set. Successful discrimination of S. divinorum from the other four Salvia species was possible based on visual assessment of the PCA scores plot. To provide a numerical assessment of the discrimination, a series of statistical procedures such as Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores. The statistical procedures were then compared to determine the advantages and disadvantages for forensic applications.
Meskaldji, Djalel Eddine; Hagmann, Patric; Meuli, Reto; Thiran, Jean Philippe; Morgenthaler, Stephan
2010-01-01
In neuroimaging, a large number of correlated tests are routinely performed to detect active voxels in single-subject experiments or to detect regions that differ between individuals belonging to different groups. In order to bound the probability of a false discovery of pair-wise differences, a Bonferroni or other correction for multiplicity is necessary. These corrections greatly reduce the power of the comparisons which means that small signals (differences) remain hidden and therefore have been more or less successful depending on the application. We introduce a method that improves the power of a family of correlated statistical tests by reducing their number in an orderly fashion using our a-priori understanding of the problem . The tests are grouped by blocks that respect the data structure and only one or a few tests per group are performed. For each block we construct an appropriate summary statistic that characterizes a meaningful feature of the block. The comparisons are based on these summary stat...
Statistically tuned Gaussian background subtraction technique for UAV videos
Indian Academy of Sciences (India)
R Athi Lingam; K Senthil Kumar
2014-08-01
Background subtraction is one of the efficient techniques to segment the targets from non-informative background of a video. The traditional background subtraction technique suits for videos with static background whereas the video obtained from unmanned aerial vehicle has dynamic background. Here, we propose an algorithm with tuning factor and Gaussian update for surveillance videos that suits effectively for aerial videos. The tuning factor is optimized by extracting the statistical features of the input frames.With the optimized tuning factor and Gaussian update an adaptive Gaussian-based background subtraction technique is proposed. The algorithm involves modelling, update and subtraction phases. This running Gaussian average based background subtraction technique uses updation at both model generation phase and subtraction phase. The resultant video extracts the moving objects from the dynamic background. Sample videos of various properties such as cluttered background, small objects, moving background and multiple objects are considered for evaluation. The technique is statistically compared with frame differencing technique, temporal median method and mixture of Gaussian model and performance evaluation is done to check the effectiveness of the proposed technique after optimization for both static and dynamic videos.
Fernández-González, Daniel; Martín-Duarte, Ramón; Ruiz-Bustinza, Íñigo; Mochón, Javier; González-Gasca, Carmen; Verdeja, Luis Felipe
2016-08-01
Blast furnace operators expect to get sinter with homogenous and regular properties (chemical and mechanical), necessary to ensure regular blast furnace operation. Blends for sintering also include several iron by-products and other wastes that are obtained in different processes inside the steelworks. Due to their source, the availability of such materials is not always consistent, but their total production should be consumed in the sintering process, to both save money and recycle wastes. The main scope of this paper is to obtain the least expensive iron ore blend for the sintering process, which will provide suitable chemical and mechanical features for the homogeneous and regular operation of the blast furnace. The systematic use of statistical tools was employed to analyze historical data, including linear and partial correlations applied to the data and fuzzy clustering based on the Sugeno Fuzzy Inference System to establish relationships among the available variables.
Bootstrap-based confidence estimation in PCA and multivariate statistical process control
DEFF Research Database (Denmark)
Babamoradi, Hamid
Traditional/Asymptotic confidence estimation has limited applicability since it needs statistical theories to estimate the confidences, which are not available for all indicators/parameters. Furthermore, in case the theories are available for a specific indicator/parameter, the theories are based...... on assumptions that do not always hold in practice. The aim of this thesis was to illustrate the concept of bootstrap-based confidence estimation in PCA and MSPC. It particularly shows how to build bootstrapbased confidence limits in these areas to be used as alternative to the traditional/asymptotic limits....... The goal was to improve process monitoring by improving the quality of MSPC charts and contribution plots. Bootstrapping algorithm to build confidence limits was illustrated in a case study format (Paper I). The main steps in the algorithm were discussed where a set of sensible choices (plus...
Marengo, Emilio; Manfredi, Marcello; Zerbinati, Orfeo; Robotti, Elisa; Mazzucco, Eleonora; Gosetti, Fabio; Bearman, Greg; France, Fenella; Shor, Pnina
2011-09-01
The aim of this project is the development of a noninvasive technique based on LED multispectral imaging (MSI) for monitoring the conservation state of the Dead Sea Scrolls (DSS) collection. It is well-known that changes in the parchment reflectance drive the transition of the scrolls from legible to illegible. Capitalizing on this fact, we will use spectral imaging to detect changes in the reflectance before they become visible to the human eye. The technique uses multivariate analysis and statistical process control theory. The present study was carried out on a "sample" parchment of calfskin. The monitoring of the surface of a commercial modern parchment aged consecutively for 2 h and 6 h at 80 °C and 50% relative humidity (ASTM) was performed at the Imaging Lab of the Library of Congress (Washington, DC, U.S.A.). MSI is here carried out in the vis-NIR range limited to 1 μm, with a number of bands of 13 and bandwidths that range from about 10 nm in UV to 40 nm in IR. Results showed that we could detect and locate changing pixels, on the basis of reflectance changes, after only a few "hours" of aging.
Air Quality Forecasting through Different Statistical and Artificial Intelligence Techniques
Mishra, D.; Goyal, P.
2014-12-01
Urban air pollution forecasting has emerged as an acute problem in recent years because there are sever environmental degradation due to increase in harmful air pollutants in the ambient atmosphere. In this study, there are different types of statistical as well as artificial intelligence techniques are used for forecasting and analysis of air pollution over Delhi urban area. These techniques are principle component analysis (PCA), multiple linear regression (MLR) and artificial neural network (ANN) and the forecasting are observed in good agreement with the observed concentrations through Central Pollution Control Board (CPCB) at different locations in Delhi. But such methods suffers from disadvantages like they provide limited accuracy as they are unable to predict the extreme points i.e. the pollution maximum and minimum cut-offs cannot be determined using such approach. Also, such methods are inefficient approach for better output forecasting. But with the advancement in technology and research, an alternative to the above traditional methods has been proposed i.e. the coupling of statistical techniques with artificial Intelligence (AI) can be used for forecasting purposes. The coupling of PCA, ANN and fuzzy logic is used for forecasting of air pollutant over Delhi urban area. The statistical measures e.g., correlation coefficient (R), normalized mean square error (NMSE), fractional bias (FB) and index of agreement (IOA) of the proposed model are observed in better agreement with the all other models. Hence, the coupling of statistical and artificial intelligence can be use for the forecasting of air pollutant over urban area.
Directory of Open Access Journals (Sweden)
Xiangyu Mu
2014-09-01
Full Text Available Natural factors and anthropogenic activities both contribute dissolved chemical loads to lakes and streams. Mineral solubility, geomorphology of the drainage basin, source strengths and climate all contribute to concentrations and their variability. Urbanization and agriculture waste-water particularly lead to aquatic environmental degradation. Major contaminant sources and controls on water quality can be asssessed by analyzing the variability in proportions of major and minor solutes in water coupled to mutivariate statistical methods. The demand for freshwater needed for increasing crop production puulation and industrialization occurs almost everywhere in in China and these conflicting needs have led to widespread water contamination. Because of heavy nutrient loadings from all of these sources, Lake Taihu (eastern China notably suffers periodic hyper-eutrophication and drinking water deterioration, which has led to shortages of freshwater for the City of Wuxi and other nearby cities. This lake, the third largest freshwater body in China, has historically beeen considered a cultural treasure of China, and has supported long-term fisheries. The is increasing pressure to remediate the present contamination which compromises both aquiculture and the prior economic base centered on tourism. However, remediation cannot be effectively done without first characterizing the broad nature of the non-point source pollution. To this end, we investigated the hydrochemical setting of Lake Taihu to determine how different land use types influence the variability of surface water chemistry in different water sources to the lake. We found that waters broadly show wide variability ranging from calcium-magnesium-bicarbonate hydrochemical facies type to mixed sodium-sulfate-chloride type. Principal components analysis produced three principal components that explained 78% of the variance in the water quality and reflect three major types of water
Lightweight and Statistical Techniques for Petascale PetaScale Debugging
Energy Technology Data Exchange (ETDEWEB)
Miller, Barton
2014-06-30
This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errors in widely used sequential or threaded applications in the Cooperative Bug Isolation (CBI) project to large scale parallel applications. Overall, our research substantially improved productivity on petascale platforms through a tool set for debugging that complements existing commercial tools. Previously, Office Of Science application developers relied either on primitive manual debugging techniques based on printf or they use tools, such as TotalView, that do not scale beyond a few thousand processors. However, bugs often arise at scale and substantial effort and computation cycles are wasted in either reproducing the problem in a smaller run that can be analyzed with the traditional tools or in repeated runs at scale that use the primitive techniques. New techniques that work at scale and automate the process of identifying the root cause of errors were needed. These techniques significantly reduced the time spent debugging petascale applications, thus leading to a greater overall amount of time for application scientists to pursue the scientific objectives for which the systems are purchased. We developed a new paradigm for debugging at scale: techniques that reduced the debugging scenario to a scale suitable for traditional debuggers, e.g., by narrowing the search for the root-cause analysis
Directory of Open Access Journals (Sweden)
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
Statistical and Economic Techniques for Site-specific Nematode Management.
Liu, Zheng; Griffin, Terry; Kirkpatrick, Terrence L
2014-03-01
Recent advances in precision agriculture technologies and spatial statistics allow realistic, site-specific estimation of nematode damage to field crops and provide a platform for the site-specific delivery of nematicides within individual fields. This paper reviews the spatial statistical techniques that model correlations among neighboring observations and develop a spatial economic analysis to determine the potential of site-specific nematicide application. The spatial econometric methodology applied in the context of site-specific crop yield response contributes to closing the gap between data analysis and realistic site-specific nematicide recommendations and helps to provide a practical method of site-specifically controlling nematodes.
Sarhadi, Ali; Burn, Donald H.; Johnson, Fiona; Mehrotra, Raj; Sharma, Ashish
2016-05-01
Accurate projection of global warming on the probabilistic behavior of hydro-climate variables is one of the main challenges in climate change impact assessment studies. Due to the complexity of climate-associated processes, different sources of uncertainty influence the projected behavior of hydro-climate variables in regression-based statistical downscaling procedures. The current study presents a comprehensive methodology to improve the predictive power of the procedure to provide improved projections. It does this by minimizing the uncertainty sources arising from the high-dimensionality of atmospheric predictors, the complex and nonlinear relationships between hydro-climate predictands and atmospheric predictors, as well as the biases that exist in climate model simulations. To address the impact of the high dimensional feature spaces, a supervised nonlinear dimensionality reduction algorithm is presented that is able to capture the nonlinear variability among projectors through extracting a sequence of principal components that have maximal dependency with the target hydro-climate variables. Two soft-computing nonlinear machine-learning methods, Support Vector Regression (SVR) and Relevance Vector Machine (RVM), are engaged to capture the nonlinear relationships between predictand and atmospheric predictors. To correct the spatial and temporal biases over multiple time scales in the GCM predictands, the Multivariate Recursive Nesting Bias Correction (MRNBC) approach is used. The results demonstrate that this combined approach significantly improves the downscaling procedure in terms of precipitation projection.
Brandmeier, M.; Wörner, G.
2016-10-01
Multivariate statistical and geospatial analyses based on a compilation of 890 geochemical and 1200 geochronological data for 194 mapped ignimbrites from the Central Andes document the compositional and temporal patterns of large-volume ignimbrites (so-called "ignimbrite flare-ups") during Neogene times. Rapid advances in computational science during the past decade led to a growing pool of algorithms for multivariate statistics for large datasets with many predictor variables. This study applies cluster analysis (CA) and linear discriminant analysis (LDA) on log-ratio transformed data with the aim of (1) testing a tool for ignimbrite correlation and (2) distinguishing compositional groups that reflect different processes and sources of ignimbrite magmatism during the geodynamic evolution of the Central Andes. CA on major and trace elements allows grouping of ignimbrites according to their geochemical characteristics into rhyolitic and dacitic "end-members" and to differentiate characteristic trace element signatures with respect to Eu anomaly, depletions in middle and heavy rare earth elements (REE) and variable enrichments in light REE. To highlight these distinct compositional signatures, we applied LDA to selected ignimbrites for which comprehensive datasets were available. In comparison to traditional geochemical parameters we found that the advantage of multivariate statistics is their capability of dealing with large datasets and many variables (elements) and to take advantage of this n-dimensional space to detect subtle compositional differences contained in the data. The most important predictors for discriminating ignimbrites are La, Yb, Eu, Al2O3, K2O, P2O5, MgO, FeOt, and TiO2. However, other REE such as Gd, Pr, Tm, Sm, Dy and Er also contribute to the discrimination functions. Significant compositional differences were found between (1) the older (> 13 Ma) large-volume plateau-forming ignimbrites in northernmost Chile and southern Peru and (2) the
Deng, Linhua
2015-07-01
Three nonlinear analysis techniques, including cross-recurrence plot, line of synchronization, and cross-wavelet transform, are proposed to estimate the coherent phase vibrations of nonlinear and non-stationary time series. The case study utilizes the monthly averages of sunspot areas during the time interval from May 1874 to August 2014. The following prominent results are found: (1) the phase-leading hemisphere of long-term sunspot areas has changed twice in the past 140 years, indicating that the hemispheric imbalances and apparent phase differences on both hemispheres are a prevalent behavior and are not anomalous; (2) the alternating regularity of hemispheric asynchronism exhibits a cyclical pattern of 4.5+3.5 cycles, and the magnetic flux excess in a certain hemisphere during the ascending branch of a cycle can be taken as an indication of the phase-leading hemisphere in this cycle. We firmly believe that powerful nonlinear approaches are more advanced than classical linear methods when they are combined to determine the dynamic complexity of nonlinear physical systems.
Source Apportionment of Heavy Metals in Soils Using Multivariate Statistics and Geostatistics
Institute of Scientific and Technical Information of China (English)
QU Ming-Kai; LI Wei-Dong; ZHANG Chuan-Rong; WANG Shan-Qin; YANG Yong; HE Li-Yuan
2013-01-01
The main objectives of this study were to introduce an integrated method for effectively identifying soil heavy metal pollution sources and apportioning their contributions,and apply it to a case study.The method combines the principal component analysis/absolute principal component scores (PCA/APCS) receptor model and geostatistics.The case study was conducted in an area of 31 km2 in the urban-rural transition zone of Wuhan,a metropolis of central China.124 topsoil samples were collected for measuring the concentrations of eight heavy metal elements (Mn,Cu,Zn,Pb,Cd,Cr,Ni and Co).PCA results revealed that three major factors were responsible for soil heavy metal pollution,which were initially identified as "steel production","agronomic input" and "coal consumption".The APCS technique,combined with multiple linear regression analysis,was then applied for source apportionment.Steel production appeared to be the main source for Ni,Co,Cd,Zn and Mn,agronomic input for Cu,and coal consumption for Pb and Cr.Geostatistical interpolation using ordinary kriging was finally used to map the spatial distributions of the contributions of pollution sources and further confirm the result interpretations.The introduced method appears to be an effective tool in soil pollution source apportionment and identification,and might provide valuable reference information for pollution control and environmental management.
Multivariate statistical analysis treatment of DSC thermal properties for animal fat adulteration.
Dahimi, Omar; Rahim, Alina Abdul; Abdulkarim, S M; Hassan, Mohd Sukri; Hashari, Shazamawati B T Zam; Mashitoh, A Siti; Saadi, Sami
2014-09-01
The adulteration of edible fats is a kind of fraud that impairs the physical and chemical features of the original lipid materials. It has been detected in various food, pharmaceutical and cosmeceutical products. Differential scanning calorimetry (DSC) is the robust thermo-analytical machine that permits to fingerprint the primary crystallisation of triacylglycerols (TAGs) molecules and their transition behaviours. The aims of this study was to assess the cross-contamination caused by lard concentration of 0.5-5% in the mixture systems containing beef tallow (BT) and chicken fat (CF) separately. TAGs species of pure and adulterated lipids in relation to their crystallisation and melting parameters were studied using principal components analysis (PCA). The results showed that by using the heating profiles the discrimination of LD from BT and CF was very clear even at low dose of less than 1%. Same observation was depicted from the crystallisation profiles of BT adulterated by LD doses ranging from 0.1% to 1% and from 2% to 5%, respectively. Furthermore, CF adulterated with LD did not exhibit clear changes on its crystallisation profiles. Consequently, DSC coupled with PCA is one of the techniques that might use to monitor and differentiate the minimum adulteration levels caused by LD in different animal fats.
Silveira, Landulfo; Borges, Rita de Cássia Fernandes; Navarro, Ricardo Scarparo; Giana, Hector Enrique; Zângaro, Renato Amaro; Pacheco, Marcos Tadeu Tavares; Fernandes, Adriana Barrinha
2017-05-01
Raman spectroscopy has been employed in the quantitative analysis of biochemical components in human serum. This study aimed to develop a spectral model to estimate the concentration of glucose and lipid fractions in human serum, thus evaluating the feasibility of Raman spectroscopy technique for diagnostic purposes. A total of 44 samples of blood serum were collected from volunteers submitted to routine blood biochemical assay analysis. The biochemical concentrations of glucose, triglycerides, cholesterol, and high-density and low-density lipoproteins (HDL and LDL) were obtained by colorimetric method. Serum samples (200 μL) were submitted to Raman spectroscopy (830 nm, 250 mW, 50-s accumulation). The spectra of sera present peaks related to the main constituents, particularly proteins and lipids. A quantitative model based on partial least squares (PLS) regression has been developed to estimate the concentration of these compounds, taking the biochemical concentrations assayed by the colorimetric method as sample's actual concentrations. The PLS model based on leave-one-out cross-validation approach estimated the concentration of triglycerides and cholesterol with r = 0.98 and 0.96, and root mean square error of 35.4 and 15.9 mg/dL, respectively. For the other biochemicals, the r was ranging from 0.75 to 0.86. These results evidenced the possibility of performing biochemical assay in blood serum samples by Raman spectroscopy and PLS regression and may be employed as a means of diagnosis in routine clinical analysis.
Wallace, Jack; Champagne, Pascale; Hall, Geof
2016-06-01
The wastewater stabilization ponds (WSPs) at a wastewater treatment facility in eastern Ontario, Canada, have experienced excessive algae growth and high pH levels in the summer months. A full range of parameters were sampled from the system and the chemical dynamics in the three WSPs were assessed through multivariate statistical analysis. The study presents a novel approach for exploratory analysis of a comprehensive water chemistry dataset, incorporating principal components analysis (PCA) and principal components (PC) and partial least squares (PLS) regressions. The analyses showed strong correlations between chl-a and sunlight, temperature, organic matter, and nutrients, and weak and negative correlations between chl-a and pH and chl-a and DO. PCA reduced the data from 19 to 8 variables, with a good fit to the original data matrix (similarity measure of 0.73). Multivariate regressions to model system pH in terms of these key parameters were performed on the reduced variable set and the PCs generated, for which strong fits (R(2) > 0.79 with all data) were observed. The methodologies presented in this study are applicable to a wide range of natural and engineered systems where a large number of water chemistry parameters are monitored resulting in the generation of large data sets. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Survey on Statistical Based Single Channel Speech Enhancement Techniques
Directory of Open Access Journals (Sweden)
Sunnydayal. V
2014-11-01
Full Text Available Speech enhancement is a long standing problem with various applications like hearing aids, automatic recognition and coding of speech signals. Single channel speech enhancement technique is used for enhancement of the speech degraded by additive background noises. The background noise can have an adverse impact on our ability to converse without hindrance or smoothly in very noisy environments, such as busy streets, in a car or cockpit of an airplane. Such type of noises can affect quality and intelligibility of speech. This is a survey paper and its object is to provide an overview of speech enhancement algorithms so that enhance the noisy speech signal which is corrupted by additive noise. The algorithms are mainly based on statistical based approaches. Different estimators are compared. Challenges and Opportunities of speech enhancement are also discussed. This paper helps in choosing the best statistical based technique for speech enhancement
Lifshits, A M
1979-01-01
General characteristics of the multivariate statistical analysis (MSA) is given. Methodical premises and criteria for the selection of an adequate MSA method applicable to pathoanatomic investigations of the epidemiology of multicausal diseases are presented. The experience of using MSA with computors and standard computing programs in studies of coronary arteries aterosclerosis on the materials of 2060 autopsies is described. The combined use of 4 MSA methods: sequential, correlational, regressional, and discriminant permitted to quantitate the contribution of each of the 8 examined risk factors in the development of aterosclerosis. The most important factors were found to be the age, arterial hypertension, and heredity. Occupational hypodynamia and increased fatness were more important in men, whereas diabetes melitus--in women. The registration of this combination of risk factors by MSA methods provides for more reliable prognosis of the likelihood of coronary heart disease with a fatal outcome than prognosis of the degree of coronary aterosclerosis.
1981-09-01
Statistics, Carnegie-Mellon University. **At present with the Air Force Institute of Technology, Wright-Patterson Air Force Base, Ohio. #.,p- edfor puL lic...recurrence relation 3J B. - Z kak8" k ’ o = 1 (2.7) J -O k =l k . We use the following notations as defined in James (1964). The complex multivariate...gamma functions F’ (a)p and r P(a, K ) are given by (a) p(p-l)/2 P (a) = Tp ) r(a-i+l). (2.8) p i=l pa <) p(p-l)/2 Pr (a, K ) = n(a-i+l+ki) (2.9) i=l where
Kamal, Ghulam Mustafa; Wang, Xiaohua; Bin Yuan; Wang, Jie; Sun, Peng; Zhang, Xu; Liu, Maili
2016-09-01
Soy sauce a well known seasoning all over the world, especially in Asia, is available in global market in a wide range of types based on its purpose and the processing methods. Its composition varies with respect to the fermentation processes and addition of additives, preservatives and flavor enhancers. A comprehensive (1)H NMR based study regarding the metabonomic variations of soy sauce to differentiate among different types of soy sauce available on the global market has been limited due to the complexity of the mixture. In present study, (13)C NMR spectroscopy coupled with multivariate statistical data analysis like principle component analysis (PCA), and orthogonal partial least square-discriminant analysis (OPLS-DA) was applied to investigate metabonomic variations among different types of soy sauce, namely super light, super dark, red cooking and mushroom soy sauce. The main additives in soy sauce like glutamate, sucrose and glucose were easily distinguished and quantified using (13)C NMR spectroscopy which were otherwise difficult to be assigned and quantified due to serious signal overlaps in (1)H NMR spectra. The significantly higher concentration of sucrose in dark, red cooking and mushroom flavored soy sauce can directly be linked to the addition of caramel in soy sauce. Similarly, significantly higher level of glutamate in super light as compared to super dark and mushroom flavored soy sauce may come from the addition of monosodium glutamate. The study highlights the potentiality of (13)C NMR based metabonomics coupled with multivariate statistical data analysis in differentiating between the types of soy sauce on the basis of level of additives, raw materials and fermentation procedures.
Zanon, Cristina; Stocchero, Matteo; Albiero, Elena; Castegnaro, Silvia; Chieregato, Katia; Madeo, Domenico; Rodeghiero, Francesco; Astori, Giuseppe
2014-07-01
Cytokine-induced killer (CIK) cells, obtained after mononucleated cell stimulation with interferon-γ, interleukin-2, and anti-CD3 antibody, are constituted by CD3(+) CD56(+) (CIK) cells and a minority of natural killer (NK; CD3(-) CD56(+) ) cells and T-lymphocytes (CD3(+) CD56(-) ) with antitumor effect against hematological malignancies, thus representing a promising immunotherapy strategy. To ensure in vivo antitumor activity it is mandatory to maximize the percentage of CD3(+) 56(+) effector cells, which is highly variable depending on the starting sample and the harvesting day. Based on cytofluorimetric data, we have retrospectively applied multivariate statistical data analysis (MVDA) to 30 expansions building mathematical models able to predict the expansion fate and the optimal CIK harvesting day. Cell phenotype was monitored during culture; multivariate batch statistical process control was applied to monitor cell expansion and orthogonal projections to latent structures to predict CIK percentage. Ten expansions had CD3(+) CD56(+) cells ≥ 40% (good batches) and 20 had CD3(+) CD56(+) cells ≤ 40%. In 36.7%, CD3(+) CD56(+) cells reached the highest concentration at day 17 and the others at day 21. We built a highly predictive regression model for estimating CD3(+) CD56(+) cells during culture. Three variables resulted highly informative: NK % at day 0, cytotoxic T-lymphocytes % (CTLs, CD3(+) CD8(+) ) at day 4, and CIK % at day 7. "Good batches" are characterized by a high percentage of CTLs and CD3(+) CD56(+) cells at day 4 and day 7, respectively. By applying MVDA it is possible to optimize CIK expansion, deciding the optimal cell harvesting day. A predictive role for CTL and CIK was evidenced. © 2013 Clinical Cytometry Society.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H; Fischl, Bruce
2016-07-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer's and Huntington's diseases (Salat et al., 2010; Rosas et al., 2006). The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as diffusion tensor imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer's disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of
Hussein, Mohammed Tahir
Hydrochemical evaluation of groundwater systems can be carried out using conventional and multivariate techniques, namely cluster, factor analyses and others such as correspondence analysis. The main objective of this study is to investigate the groundwater quality in the Blue Nile basin of eastern Sudan, and to workout a hydrochemical evaluation for the aquifer system. Conventional methods and multivariate techniques were applied to achieve these goals. Two water-bearing layers exist in the study area: the Nubian Sandstone Formation and the Al-Atshan Formation. The Nubian aquifer is recharged mainly from the Blue Nile and Dinder Rivers through lateral subsurface flow and through direct rainfall in outcrop areas. The Al-Atshan aquifer receives water through underground flow from River Rahad and from rainfall infiltration. The prevailing hydrochemical processes are simple dissolution, mixing, partial ion exchange and ion exchange. Limited reverse ion exchange has been witnessed in the Nubian aquifer. Three factors control the overall mineralization and water quality of the Blue Nile Basin. The first factor includes high values of total dissolved solids, electrical conductivity, sodium, potassium, chloride, bicarbonate, sulphate and magnesium. The second factor includes calcium and pH. The third factor is due to fluoride concentration in the groundwater. The study highlights the descriptive capabilities of conventional and multivariate techniques as effective tools in groundwater quality evaluation. Une étude hydrochimique de systèmes aquifères a pu être réalisée au moyen des techniques conventionnelles et multidimensionnelles, telles que les analyses de cluster et factorielles, ainsi que d'autres comme l'analyse des correspondances. Le principal objectif de ce travail est d'étudier la qualité des eaux souterraines du bassin du Nil bleu au Soudan oriental, et de réaliser une évaluation hydrochimique du système aquifère. Des méthodes conventionnelles et
Comparison of three Statistical Classification Techniques for Maser Identification
Manning, Ellen M; Ellingsen, Simon P; Breen, Shari L; Chen, Xi; Humphries, Melissa
2016-01-01
We applied three statistical classification techniques - linear discriminant analysis (LDA), logistic regression and random forests - to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the ease, or otherwise, with which the results of each classification technique can be interpreted. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained ...
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Combining heuristic and statistical techniques in landslide hazard assessments
Cepeda, Jose; Schwendtner, Barbara; Quan, Byron; Nadim, Farrokh; Diaz, Manuel; Molina, Giovanni
2014-05-01
As a contribution to the Global Assessment Report 2013 - GAR2013, coordinated by the United Nations International Strategy for Disaster Reduction - UNISDR, a drill-down exercise for landslide hazard assessment was carried out by entering the results of both heuristic and statistical techniques into a new but simple combination rule. The data available for this evaluation included landslide inventories, both historical and event-based. In addition to the application of a heuristic method used in the previous editions of GAR, the availability of inventories motivated the use of statistical methods. The heuristic technique is largely based on the Mora & Vahrson method, which estimates hazard as the product of susceptibility and triggering factors, where classes are weighted based on expert judgment and experience. Two statistical methods were also applied: the landslide index method, which estimates weights of the classes for the susceptibility and triggering factors based on the evidence provided by the density of landslides in each class of the factors; and the weights of evidence method, which extends the previous technique to include both positive and negative evidence of landslide occurrence in the estimation of weights for the classes. One key aspect during the hazard evaluation was the decision on the methodology to be chosen for the final assessment. Instead of opting for a single methodology, it was decided to combine the results of the three implemented techniques using a combination rule based on a normalization of the results of each method. The hazard evaluation was performed for both earthquake- and rainfall-induced landslides. The country chosen for the drill-down exercise was El Salvador. The results indicate that highest hazard levels are concentrated along the central volcanic chain and at the centre of the northern mountains.
Statistical optimisation techniques in fatigue signal editing problem
Energy Technology Data Exchange (ETDEWEB)
Nopiah, Z. M.; Osman, M. H. [Fundamental Engineering Studies Unit Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, 43600 UKM (Malaysia); Baharin, N.; Abdullah, S. [Department of Mechanical and Materials Engineering Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, 43600 UKM (Malaysia)
2015-02-03
Success in fatigue signal editing is determined by the level of length reduction without compromising statistical constraints. A great reduction rate can be achieved by removing small amplitude cycles from the recorded signal. The long recorded signal sometimes renders the cycle-to-cycle editing process daunting. This has encouraged researchers to focus on the segment-based approach. This paper discusses joint application of the Running Damage Extraction (RDE) technique and single constrained Genetic Algorithm (GA) in fatigue signal editing optimisation.. In the first section, the RDE technique is used to restructure and summarise the fatigue strain. This technique combines the overlapping window and fatigue strain-life models. It is designed to identify and isolate the fatigue events that exist in the variable amplitude strain data into different segments whereby the retention of statistical parameters and the vibration energy are considered. In the second section, the fatigue data editing problem is formulated as a constrained single optimisation problem that can be solved using GA method. The GA produces the shortest edited fatigue signal by selecting appropriate segments from a pool of labelling segments. Challenges arise due to constraints on the segment selection by deviation level over three signal properties, namely cumulative fatigue damage, root mean square and kurtosis values. Experimental results over several case studies show that the idea of solving fatigue signal editing within a framework of optimisation is effective and automatic, and that the GA is robust for constrained segment selection.
Jacobson, Dan; Monforte, Ana Rita; Silva Ferreira, António César
2013-03-13
Chromatography separates the different components of complex mixtures and generates a fingerprint representing the chemical composition of the sample. The resulting data structure depends on the characteristics of the detector used, univariate for devices such as a flame ionization detector (FID) or multivariate for mass spectroscopy (MS). This study addresses the potential use of a univariate signal for a nontargeted approach to (i) classify samples according to a given process or perturbation, (ii) evaluate the feasibility of developing a screening procedure to select candidates related to the process, and (iii) provide insight into the chemical mechanisms that are affected by the perturbation. To achieve this, it was necessary to use and develop methods for data preprocessing and visualization tools to assist an analytical chemist to view and interpret complex multidimensional data sets. Dichloromethane Port wine extracts were collected using GC-FID; the chromatograms were then aligned with correlation optimized warping (COW) and subsequently analyzed with multivariate statistics (MVA) by principal component analysis (PCA) and partial least-squares regression (PLS-R). Furthermore, wavelets were used for peak calling and alignment refinement, and the resulting matrix was used to perform kinetic network reconstruction via correlation networks and maximum spanning trees. Network-target correlation projections were used to screen for potential chromatographic regions/peaks related to aging mechanisms. Results from PLS between aligned chromatograms and target molecules showed high X to Y correlations of 0.91, 092, and 0.89 with 5-hydroxymethylfurfural (HMF) (Maillard), acetaldehyde (oxidation), and 4,5-dimethyl-(5H)-3-hydroxy-2-furanone, respectively. The context of the correlation (and therefore likely kinetic) relationships among compounds detected by GC-FID and the relationships between target compounds within different regions of the network can be clearly seen.
Jiang, Miaomiao; Jiao, Yujiao; Wang, Yuefei; Xu, Lei; Wang, Meng; Zhao, Buchang; Jia, Lifu; Pan, Hao; Zhu, Yan; Gao, Xiumei
2014-01-01
Botanical primary metabolites extensively exist in herbal medicine injections (HMIs), but often were ignored to control. With the limitation of bias towards hydrophilic substances, the primary metabolites with strong polarity, such as saccharides, amino acids and organic acids, are usually difficult to detect by the routinely applied reversed-phase chromatographic fingerprint technology. In this study, a proton nuclear magnetic resonance (1H NMR) profiling method was developed for efficient identification and quantification of small polar molecules, mostly primary metabolites in HMIs. A commonly used medicine, Danhong injection (DHI), was employed as a model. With the developed method, 23 primary metabolites together with 7 polyphenolic acids were simultaneously identified, of which 13 metabolites with fully separated proton signals were quantified and employed for further multivariate quality control assay. The quantitative 1H NMR method was validated with good linearity, precision, repeatability, stability and accuracy. Based on independence principal component analysis (IPCA), the contents of 13 metabolites were characterized and dimensionally reduced into the first two independence principal components (IPCs). IPC1 and IPC2 were then used to calculate the upper control limits (with 99% confidence ellipsoids) of χ2 and Hotelling T2 control charts. Through the constructed upper control limits, the proposed method was successfully applied to 36 batches of DHI to examine the out-of control sample with the perturbed levels of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde. The integrated strategy has provided a reliable approach to identify and quantify multiple polar metabolites of DHI in one fingerprinting spectrum, and it has also assisted in the establishment of IPCA models for the multivariate statistical evaluation of HMIs.
Directory of Open Access Journals (Sweden)
Miaomiao Jiang
Full Text Available Botanical primary metabolites extensively exist in herbal medicine injections (HMIs, but often were ignored to control. With the limitation of bias towards hydrophilic substances, the primary metabolites with strong polarity, such as saccharides, amino acids and organic acids, are usually difficult to detect by the routinely applied reversed-phase chromatographic fingerprint technology. In this study, a proton nuclear magnetic resonance (1H NMR profiling method was developed for efficient identification and quantification of small polar molecules, mostly primary metabolites in HMIs. A commonly used medicine, Danhong injection (DHI, was employed as a model. With the developed method, 23 primary metabolites together with 7 polyphenolic acids were simultaneously identified, of which 13 metabolites with fully separated proton signals were quantified and employed for further multivariate quality control assay. The quantitative 1H NMR method was validated with good linearity, precision, repeatability, stability and accuracy. Based on independence principal component analysis (IPCA, the contents of 13 metabolites were characterized and dimensionally reduced into the first two independence principal components (IPCs. IPC1 and IPC2 were then used to calculate the upper control limits (with 99% confidence ellipsoids of χ2 and Hotelling T2 control charts. Through the constructed upper control limits, the proposed method was successfully applied to 36 batches of DHI to examine the out-of control sample with the perturbed levels of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde. The integrated strategy has provided a reliable approach to identify and quantify multiple polar metabolites of DHI in one fingerprinting spectrum, and it has also assisted in the establishment of IPCA models for the multivariate statistical evaluation of HMIs.
Seasonal drought predictability in Portugal using statistical-dynamical techniques
Ribeiro, A. F. S.; Pires, C. A. L.
2016-08-01
Atmospheric forecasting and predictability are important to promote adaption and mitigation measures in order to minimize drought impacts. This study estimates hybrid (statistical-dynamical) long-range forecasts of the regional drought index SPI (3-months) over homogeneous regions from mainland Portugal, based on forecasts from the UKMO operational forecasting system, with lead-times up to 6 months. ERA-Interim reanalysis data is used for the purpose of building a set of SPI predictors integrating recent past information prior to the forecast launching. Then, the advantage of combining predictors with both dynamical and statistical background in the prediction of drought conditions at different lags is evaluated. A two-step hybridization procedure is performed, in which both forecasted and observed 500 hPa geopotential height fields are subjected to a PCA in order to use forecasted PCs and persistent PCs as predictors. A second hybridization step consists on a statistical/hybrid downscaling to the regional SPI, based on regression techniques, after the pre-selection of the statistically significant predictors. The SPI forecasts and the added value of combining dynamical and statistical methods are evaluated in cross-validation mode, using the R2 and binary event scores. Results are obtained for the four seasons and it was found that winter is the most predictable season, and that most of the predictive power is on the large-scale fields from past observations. The hybridization improves the downscaling based on the forecasted PCs, since they provide complementary information (though modest) beyond that of persistent PCs. These findings provide clues about the predictability of the SPI, particularly in Portugal, and may contribute to the predictability of crops yields and to some guidance on users (such as farmers) decision making process.
Comparison of Three Statistical Classification Techniques for Maser Identification
Manning, Ellen M.; Holland, Barbara R.; Ellingsen, Simon P.; Breen, Shari L.; Chen, Xi; Humphries, Melissa
2016-04-01
We applied three statistical classification techniques-linear discriminant analysis (LDA), logistic regression, and random forests-to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the interpretability of the results of each classification technique. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained is not being limited by the use of parametric models. We also found that for LDA, transformation of the data to match a normal distribution led to a significant improvement in accuracy. The different classification techniques had significant overlap in their predictions; further astronomical observations will enable the accuracy of these predictions to be tested.
Vergeynst, Leendert; Van Langenhove, Herman; Demeestere, Kristof
2015-02-17
Modern high-resolution mass spectrometry (HRMS) enables full-spectrum trace level analysis of emerging environmental organic contaminants. This raises the opportunity for post-acquisition suspect screening when no reference standards are a priori available. When setting up a conventional screening identification train based on successively different identification criteria including mass error and isotope fit, the false negative rate typically accumulates upon advancing through the decision tree. The challenge is thus to elaborate a well-balanced screening, in which the different criteria are equally stringent, leading to a controllable number of false negatives. Presented is a novel suspect screening approach using liquid-chromatography Orbitrap HRMS. Based on a multivariate statistical model, the screening takes into account the accurate mass error of the mono isotopic ion and up to three isotopes, isotope ratios, and a peak/noise filter. As such, for the first time, controlling the overall false negative rate of the screening algorithm to a desired level (5% in this study) is achieved. Simultaneously, a well-balanced identification decision is guaranteed taking the different identification criteria as a whole in a holistic statistical approach. Taking into account 1, 2, and 3 isotopes decreases the false positive rate from 22, 2.8 to <0.3%, but the cost of increasing the median limits of identification from 200, 2000 to 2062 ng L(-1), respectively, should also be considered. As proof of concept, 7 biologically treated wastewaters were screened toward 77 suspect pharmaceuticals resulting in the indicative identification of 25 suspects. Subsequently obtained reference standards allowed confirmation for 19 out of these 25 pharmaceutical contaminants.
Venkatapathi, Murugesan; Rajwa, Bartek; Ragheb, Kathy; Banada, Padmapriya P.; Lary, Todd; Robinson, J. Paul; Hirleman, E. Daniel
2008-02-01
We describe a model-based instrument design combined with a statistical classification approach for the development and realization of high speed cell classification systems based on light scatter. In our work, angular light scatter from cells of four bacterial species of interest, Bacillus subtilis, Escherichia coli, Listeria innocua, and Enterococcus faecalis, was modeled using the discrete dipole approximation. We then optimized a scattering detector array design subject to some hardware constraints, configured the instrument, and gathered experimental data from the relevant bacterial cells. Using these models and experiments, it is shown that optimization using a nominal bacteria model (i.e., using a representative size and refractive index) is insufficient for classification of most bacteria in realistic applications. Hence the computational predictions were constituted in the form of scattering-data-vector distributions that accounted for expected variability in the physical properties between individual bacteria within the four species. After the detectors were optimized using the numerical results, they were used to measure scatter from both the known control samples and unknown bacterial cells. A multivariate statistical method based on a support vector machine (SVM) was used to classify the bacteria species based on light scatter signatures. In our final instrument, we realized correct classification of B. subtilis in the presence of E. coli,L. innocua, and E. faecalis using SVM at 99.1%, 99.6%, and 98.5%, respectively, in the optimal detector array configuration. For comparison, the corresponding values for another set of angles were only 69.9%, 71.7%, and 70.2% using SVM, and more importantly, this improved performance is consistent with classification predictions.
Venkatapathi, Murugesan; Rajwa, Bartek; Ragheb, Kathy; Banada, Padmapriya P; Lary, Todd; Robinson, J Paul; Hirleman, E Daniel
2008-02-10
We describe a model-based instrument design combined with a statistical classification approach for the development and realization of high speed cell classification systems based on light scatter. In our work, angular light scatter from cells of four bacterial species of interest, Bacillus subtilis, Escherichia coli, Listeria innocua, and Enterococcus faecalis, was modeled using the discrete dipole approximation. We then optimized a scattering detector array design subject to some hardware constraints, configured the instrument, and gathered experimental data from the relevant bacterial cells. Using these models and experiments, it is shown that optimization using a nominal bacteria model (i.e., using a representative size and refractive index) is insufficient for classification of most bacteria in realistic applications. Hence the computational predictions were constituted in the form of scattering-data-vector distributions that accounted for expected variability in the physical properties between individual bacteria within the four species. After the detectors were optimized using the numerical results, they were used to measure scatter from both the known control samples and unknown bacterial cells. A multivariate statistical method based on a support vector machine (SVM) was used to classify the bacteria species based on light scatter signatures. In our final instrument, we realized correct classification of B. subtilis in the presence of E. coli,L. innocua, and E. faecalis using SVM at 99.1%, 99.6%, and 98.5%, respectively, in the optimal detector array configuration. For comparison, the corresponding values for another set of angles were only 69.9%, 71.7%, and 70.2% using SVM, and more importantly, this improved performance is consistent with classification predictions.
Institute of Scientific and Technical Information of China (English)
LI Lian-fang; LI Guo-xue; LIAO Xiao-yong
2004-01-01
This paper presented the characteristics of nitrogen and phosphorus pollution in Beijing surface water during the survey. A significant difference was found out in concentration distribution of various parameters of nitrogen and phosphorus. Most water bodies in five water systems were polluted by total nitrogen with the content even up to 120 mg/L which was higher than exceeded the fifth class standard of national surface water quality standard GB3838-2002 except for several segments of Chaobaihe and Yongdinghe. Ammonia and phosphorus showed a similar tendency of distribution with higher content in Daqinghe, Beiyunhe and Jiyunhe water systems, but with relatively low concentrations in Chaobaihe and Yongdinghe water systems. Meanwhile, nitrate was found at comparatively low content(mostly less than 10 mg/L) and could fit for corresponding water quality requirements. Totally, the water quality of Daqinghe, Jiyunhe and Beiyunhe river systems as well as the lower reaches of Yongdinghe and Chaobaihe was contaminated seriously with high content of total nitrogen and phosphorus. Through multivariate statistical approaches, it can be concluded that total nitrogen, ammonia and total phosphorus was highly correlated to chemical oxygen demand, biochemical oxygen demand, dissolved oxygen and electrical conductivity, which explained the same pollution source from anthropogenic activities.
Energy Technology Data Exchange (ETDEWEB)
Carrasquilla, Abel [Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Macae, RJ (Brazil). Lab. de Engenharia e Exploracao de Petroleo]. E-mail: abel@lenep.uenf.br; Silva, Jadir da [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Dept. de Geologia; Flexa, Roosevelt [Baker Hughes do Brasil Ltda, Macae, RJ (Brazil)
2008-07-01
In this article, we present a new approach to the automatic identification of lithologies using only well log data, which associates fuzzy logic, neural networks and multivariable statistic methods. Firstly, we chose well log data that represents lithological types, as gamma rays (GR) and density (RHOB), and, immediately, we applied a fuzzy logic algorithm to determine optimal number of clusters. In the following step, a competitive neural network is developed, based on Kohonen's learning rule, where the input layer is composed of two neurons, which represent the same number of used logs. On the other hand, the competitive layer is composed by several neurons, which have the same number of clusters as determined by the fuzzy logic algorithm. Finally, some data bank elements of the lithological types are selected at random to be the discriminate variables, which correspond to the input data of the multigroup discriminate analysis program. In this form, with the application of this methodology, the lithological types were automatically identified throughout the a well of the Namorado Oil Field, Campos Basin, which presented some difficulty in the results, mainly because of geological complexity of this field. (author)
Niaki, Seyed Taghi Akhavan; Javad Ershadi, Mohammad
2012-12-01
In this research, the main parameters of the multivariate cumulative sum (CUSUM) control chart (the reference value k, the control limit H, the sample size n and the sampling interval h) are determined by minimising the Lorenzen-Vance cost function [Lorenzen, T.J., and Vance, L.C. (1986), 'The Economic Design of Control Charts: A Unified Approach', Technometrics, 28, 3-10], in which the external costs of employing the chart are added. In addition, the model is statistically constrained to achieve desired in-control and out-of-control average run lengths. The Taguchi loss approach is used to model the problem and a genetic algorithm, for which its main parameters are tuned using the response surface methodology (RSM), is proposed to solve it. At the end, sensitivity analyses on the main parameters of the cost function are presented and their practical conclusions are drawn. The results show that RSM significantly improves the performance of the proposed algorithm and the external costs of applying the chart, which are due to real-world constraints, do not increase the average total loss very much.
Chen, Jiabo; Li, Fayun; Fan, Zhiping; Wang, Yanjie
2016-01-01
Source apportionment of river water pollution is critical in water resource management and aquatic conservation. Comprehensive application of various GIS-based multivariate statistical methods was performed to analyze datasets (2009–2011) on water quality in the Liao River system (China). Cluster analysis (CA) classified the 12 months of the year into three groups (May–October, February–April and November–January) and the 66 sampling sites into three groups (groups A, B and C) based on similarities in water quality characteristics. Discriminant analysis (DA) determined that temperature, dissolved oxygen (DO), pH, chemical oxygen demand (CODMn), 5-day biochemical oxygen demand (BOD5), NH4+–N, total phosphorus (TP) and volatile phenols were significant variables affecting temporal variations, with 81.2% correct assignments. Principal component analysis (PCA) and positive matrix factorization (PMF) identified eight potential pollution factors for each part of the data structure, explaining more than 61% of the total variance. Oxygen-consuming organics from cropland and woodland runoff were the main latent pollution factor for group A. For group B, the main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter. For group C, the evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics. PMID:27775679
Cao, Yingjie; Tang, Changyuan; Song, Xianfang; Liu, Changming; Zhang, Yinghua
2016-06-01
Two multivariate statistical technologies, factor analysis (FA) and discriminant analysis (DA), are applied to study the river and groundwater hydrochemistry and its controlling processes in the Sanjiang Plain of the northeast China. Factor analysis identifies five factors which account for 79.65 % of the total variance in the dataset. Four factors bearing specific meanings as the river and groundwater hydrochemistry controlling processes are divided into two groups, the "natural hydrochemistry evolution" group and the "pollution" group. The "natural hydrochemistry evolution" group includes the salinity factor (factor 1) caused by rock weathering and the residence time factor (factor 2) reflecting the groundwater traveling time. The "pollution" group represents the groundwater quality deterioration due to geogenic pollution caused by elevated Fe and Mn (factor 3) and elevated nitrate (NO3 -) introduced by human activities such as agriculture exploitations (factor 5). The hydrochemical difference and hydraulic connection among rivers (surface water, SW), shallow groundwater (SG) and deep groundwater (DG) group are evaluated by the factor scores obtained from FA and DA (Fisher's method). It is showed that the river water is characterized as low salinity and slight pollution, and the shallow groundwater has the highest salinity and severe pollution. The SW is well separated from SG and DG by Fisher's discriminant function, but the SG and DG can not be well separated showing their hydrochemical similarities, and emphasize hydraulic connections between SG and DG.
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality.
Energy Technology Data Exchange (ETDEWEB)
Freitas, Renato [Instituto Federal de Educacao, Ciencia e Tecnologia do Rio de Janeiro (CPAR/IFRJ), RJ (Brazil). Curso de Licenciatura em Matematica; Calza, Cristiane Ferreira; Lopes, Ricardo Tadeu [Coordenacao dos Programas de Pos-Graduacao de Engenharia (COPPE/UFRJ), RJ (Brazil); Rabello, Angela; Lima, Tania [Museu Nacional (MN/UFRJ), Rio de Janeiro, RJ (Brazil)
2011-07-01
Full text: In this work it was characterized the elemental composition of 102 fragments of Marajoara pubic covers, belonging to the National Museum collection, using EDXRF and multivariate statistics analysis. The objective was to identify possible groups of samples that presented similar characteristics. This information will be useful in the development of a systematic classification of these artifacts. Provenance studies of ancient ceramics are based on the assumption that pottery produced from a specific clay will present a similar chemical composition, which will distinguish them from pottery produced from a different clay. In this way, the pottery is assigned to particular production groups, which are then correlated with their respective origins. EDXRF measurements were carried out with a portable system, developed in the Nuclear Instrumentation Laboratory, consisting of an X-ray tube Oxford TF3005 with tungsten (W) anode, operating at 25 kV and 100 {mu}A, and a Si-PIN XR-100CR detector from Amptek. In each one of the 102 fragments, six points were analyzed (three in the front part and three in the reverse) with an acquisition time of 600 s and a beam collimation of 2 mm. The spectra were processed and analyzed using the software QXAS-AXIL from IAEA. PCA was applied to the XRF results revealing a clear cluster separation to the samples. (author)
Statistical analysis of management data
Gatignon, Hubert
2013-01-01
This book offers a comprehensive approach to multivariate statistical analyses. It provides theoretical knowledge of the concepts underlying the most important multivariate techniques and an overview of actual applications.
DEFF Research Database (Denmark)
Kallevik, H.; Hansen, Susanne Brunsgaard; Sæther, Ø.
2000-01-01
Water-in-oil emulsions are investigated by means of multivariate analysis of near infrared (NIR) spectroscopic profiles in the range 1100 - 2250 nm. The oil phase is a paraffin-diluted crude oil from the Norwegian Continental Shelf. The influence of water absorption and light scattering of the wa......Water-in-oil emulsions are investigated by means of multivariate analysis of near infrared (NIR) spectroscopic profiles in the range 1100 - 2250 nm. The oil phase is a paraffin-diluted crude oil from the Norwegian Continental Shelf. The influence of water absorption and light scattering...
Nosrati, Kazem
2013-04-01
Soil degradation associated with soil erosion and land use is a critical problem in Iran and there is little or insufficient scientific information in assessing soil quality indicator. In this study, factor analysis (FA) and discriminant analysis (DA) were used to identify the most sensitive indicators of soil quality for evaluating land use and soil erosion within the Hiv catchment in Iran and subsequently compare soil quality assessment using expert opinion based on soil surface factors (SSF) form of Bureau of Land Management (BLM) method. Therefore, 19 soil physical, chemical, and biochemical properties were measured from 56 different sampling sites covering three land use/soil erosion categories (rangeland/surface erosion, orchard/surface erosion, and rangeland/stream bank erosion). FA identified four factors that explained for 82 % of the variation in soil properties. Three factors showed significant differences among the three land use/soil erosion categories. The results indicated that based upon backward-mode DA, dehydrogenase, silt, and manganese allowed more than 80 % of the samples to be correctly assigned to their land use and erosional status. Canonical scores of discriminant functions were significantly correlated to the six soil surface indices derived of BLM method. Stepwise linear regression revealed that soil surface indices: soil movement, surface litter, pedestalling, and sum of SSF were also positively related to the dehydrogenase and silt. This suggests that dehydrogenase and silt are most sensitive to land use and soil erosion.
Noshadi, Masoud; Ghafourian, Amir
2016-07-01
This research investigated the quality of groundwater of 298 wells during 10 years, in Fars province, southern Iran, to survey spatial variation of groundwater quality and also major sources of hydro-chemical components for drinking and agricultural uses. To classify the sampling stations in each year, hierarchical cluster analysis, using the Euclidean distances and "Ward" method, was used. According to the results of cluster analysis, there were three quality groups in groundwater of the research area: first group of 170 wells with type of Ca-HCO3, second group of 98 wells with type of Ca-HCO3, and third group of 30 wells with type of Na-Cl. Hydro-chemical parameters were increased from the first to the third group, and on the basis of Schoeller and USSL diagrams, the water of wells of the third group was considered unsuitable for irrigation and drinking. Principal component (PC) analysis and factor analysis reduced the complex and voluminous data matrix into three main components, accounting for more than 80 % of the total variance. The first PC contained TDS, EC, TH, Na(+), Cl(-), Mg(2+), SO4 (2-), Ca(2+), and SAR parameters. Therefore, the first dominant factor was salinity. In PC2, HCO3 and pH were the dominant parameters, which may indicate weathering of silicate minerals. The PC3 contained high loadings for NO2 (2-) and NO3 (-). This factor indicates anthropogenic contaminants that may be caused by improper disposal of domestic wastes or the use of chemical fertilizers in agriculture and leaching of them.
Generating Virtual Patients by Multivariate and Discrete Re-Sampling Techniques.
Teutonico, D; Musuamba, F; Maas, H J; Facius, A; Yang, S; Danhof, M; Della Pasqua, O
2015-10-01
Clinical Trial Simulations (CTS) are a valuable tool for decision-making during drug development. However, to obtain realistic simulation scenarios, the patients included in the CTS must be representative of the target population. This is particularly important when covariate effects exist that may affect the outcome of a trial. The objective of our investigation was to evaluate and compare CTS results using re-sampling from a population pool and multivariate distributions to simulate patient covariates. COPD was selected as paradigm disease for the purposes of our analysis, FEV1 was used as response measure and the effects of a hypothetical intervention were evaluated in different populations in order to assess the predictive performance of the two methods. Our results show that the multivariate distribution method produces realistic covariate correlations, comparable to the real population. Moreover, it allows simulation of patient characteristics beyond the limits of inclusion and exclusion criteria in historical protocols. Both methods, discrete resampling and multivariate distribution generate realistic pools of virtual patients. However the use of a multivariate distribution enable more flexible simulation scenarios since it is not necessarily bound to the existing covariate combinations in the available clinical data sets.
Malm, Christer B.; Khoo, Nelson S.; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas
2016-01-01
achieved in two separate trials. In conclusions, autologous re-infusion of RBCs increased VO2max and performance as hypothesized, but hematological profiling by multivariate statistics could not reach the WADA stipulated false positive ratio of <0.001% at any time point investigated. A majority of samples remained within limits of normal individual variation at all times. PMID:27284981
Flach, Milan; Gans, Fabian; Brenning, Alexander; Denzler, Joachim; Reichstein, Markus; Rodner, Erik; Bathiany, Sebastian; Bodesheim, Paul; Guanche, Yanira; Sippel, Sebastian; Mahecha, Miguel D.
2017-08-01
Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advancing our understanding of vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of extreme climatic events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only a few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations like sudden changes in basic characteristics of time series such as the sample mean, the variance, changes in the cycle amplitude, and trends. This artificial experiment is needed as there is no gold standard for the identification of anomalies in real Earth observations. Our results show that a well-chosen feature extraction step (e.g., subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify three detection algorithms (k-nearest neighbors mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme-event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies
Statistical technique for analysing functional connectivity of multiple spike trains.
Masud, Mohammad Shahed; Borisyuk, Roman
2011-03-15
A new statistical technique, the Cox method, used for analysing functional connectivity of simultaneously recorded multiple spike trains is presented. This method is based on the theory of modulated renewal processes and it estimates a vector of influence strengths from multiple spike trains (called reference trains) to the selected (target) spike train. Selecting another target spike train and repeating the calculation of the influence strengths from the reference spike trains enables researchers to find all functional connections among multiple spike trains. In order to study functional connectivity an "influence function" is identified. This function recognises the specificity of neuronal interactions and reflects the dynamics of postsynaptic potential. In comparison to existing techniques, the Cox method has the following advantages: it does not use bins (binless method); it is applicable to cases where the sample size is small; it is sufficiently sensitive such that it estimates weak influences; it supports the simultaneous analysis of multiple influences; it is able to identify a correct connectivity scheme in difficult cases of "common source" or "indirect" connectivity. The Cox method has been thoroughly tested using multiple sets of data generated by the neural network model of the leaky integrate and fire neurons with a prescribed architecture of connections. The results suggest that this method is highly successful for analysing functional connectivity of simultaneously recorded multiple spike trains.
Dexter, Alex; Race, Alan M; Styles, Iain B; Bunch, Josephine
2016-11-15
Spatial clustering is a powerful tool in mass spectrometry imaging (MSI) and has been demonstrated to be capable of differentiating tumor types, visualizing intratumor heterogeneity, and segmenting anatomical structures. Several clustering methods have been applied to mass spectrometry imaging data, but a principled comparison and evaluation of different clustering techniques presents a significant challenge. We propose that testing whether the data has a multivariate normal distribution within clusters can be used to evaluate the performance when using algorithms that assume normality in the data, such as k-means clustering. In cases where clustering has been performed using the cosine distance, conversion of the data to polar coordinates prior to normality testing should be performed to ensure normality is tested in the correct coordinate system. In addition to these evaluations of internal consistency, we demonstrate that the multivariate normal distribution can then be used as a basis for statistical modeling of MSI data. This allows the generation of synthetic MSI data sets with known ground truth, providing a means of external clustering evaluation. To demonstrate this, reference data from seven anatomical regions of an MSI image of a coronal section of mouse brain were modeled. From this, a set of synthetic data based on this model was generated. Results of r(2) fitting of the chi-squared quantile-quantile plots on the seven anatomical regions confirmed that the data acquired from each spatial region was found to be closer to normally distributed in polar space than in Euclidean. Finally, principal component analysis was applied to a single data set that included synthetic and real data. No significant differences were found between the two data types, indicating the suitability of these methods for generating realistic synthetic data.
Beketov, Mikhail A; Kattwinkel, Mira; Liess, Matthias
2013-12-01
The identification of the effects of toxicants on biological communities is hampered by the complexity and variability of communities. To overcome these challenges, the trait-based SPEAR approach has been developed. This approach is based on (i) identifying the vulnerable taxa using traits and (ii) aggregating these taxa into a group to reduce the between-replicate differences and scattered low-abundance distribution, both of which are typical for biological communities. This approach allows for reduction of the noise and determination of the effects of toxicants at low concentrations in both field and mesocosm studies. However, there is a need to quantitatively investigate its potential for mesocosm data evaluations and application in the ecological risk assessment of toxicants. In the present study, we analysed how the aggregation of the sensitive taxa can facilitate the identification of the effects. We used empirical data from a long-term mesocosm experiment with stream invertebrates and an insecticide as well as a series of simulated datasets characterised by different degrees of data matrix saturation (corresponding to different sampling efforts), numbers of replicates, and between-replicate differences. The analyses of both the empirical and simulated data sets revealed that the taxa aggregation approach allows for the detection of effects at a lower saturation of the data matrices, smaller number of replicates, and higher between-replicate differences when compared to the multivariate statistical method redundancy analysis. These improvements lead to a higher sensitivity of the analysed systems, as long-term effects were detected at lower concentrations (up to 1,000 times). These outcomes suggest that methods based on taxa aggregation have a strong potential for use in mesocosm data evaluations because mesocosm studies are usually poorly replicated, have high between-replicate variability, and cannot be exhaustively sampled due to technical and financial
Rakotondrabe, Felaniaina; Ndam Ngoupayou, Jules Remy; Mfonka, Zakari; Rasolomanana, Eddy Harilala; Nyangono Abolo, Alexis Jacob; Ako Ako, Andrew
2018-01-01
The influence of gold mining activities on the water quality in the Mari catchment in Bétaré-Oya (East Cameroon) was assessed in this study. Sampling was performed within the period of one hydrological year (2015 to 2016), with 22 sampling sites consisting of groundwater (06) and surface water (16). In addition to measuring the physicochemical parameters, such as pH, electrical conductivity, alkalinity, turbidity, suspended solids and CN(-), eleven major elements (Na(+), K(+), Ca(2+), Mg(2+), NH4(+), Cl(-), NO3(-), HCO3(-), SO4(2-), PO4(3-) and F(-)) and eight heavy metals (Pb, Zn, Cd, Fe, Cu, As, Mn and Cr) were also analyzed using conventional hydrochemical methods, Multivariate Statistical Analysis and the Heavy metal Pollution Index (HPI). The results showed that the water from Mari catchment and Lom River was acidic to basic (5.40water quality, except for nitrates in some wells, which was found at a concentration >50mg NO3(-)/L. This water was found as two main types: calcium magnesium bicarbonate (CaMg-HCO3), which was the most represented, and sodium bicarbonate potassium (NaK-HCO3). As for trace elements in surface water, the contents of Pb, Cd, Mn, Cr and Fe were higher than recommended by the WHO guidelines, and therefore, the surface water was unsuitable for human consumption. Three phenomena were responsible for controlling the quality of the water in the study area: hydrolysis of silicate minerals of plutono-metamorphic rocks, which constitute the geological basement of this area; vegetation and soil leaching; and mining activities. The high concentrations of TSS and trace elements found in this basin were mainly due to gold mining activities (exploration and exploitation) as well as digging of rivers beds, excavation and gold amalgamation. Copyright © 2017 Elsevier B.V. All rights reserved.
Raji, M A; Frycák, P; Temiyasathit, C; Kim, S B; Mavromaras, G; Ahn, J-M; Schug, K A
2009-07-01
Response factors were determined for twelve GXG peptides (where G stands for glycine and X is any of alanine [A], arginine [R], asparagine [N], aspartic acid [D], glycine [G], histidine [H], leucine [L], lysine [K], phenylalanine [F], serine [S], tyrosine [Y], valine [V]) by electrospray ionization mass spectrometry (ESI-MS). The response factors were measured using a novel flow injection method. This new method is based on the Gaussian distribution of analyte concentration resulting from band-broadening dispersion experienced by the analyte upon passage through an extended volume of PEEK tubing. This method removes the need for preparing a discrete series of standard solutions to assess concentration-dependent response. Relative response factors were calculated for each peptide with reference to GGG. The observed trends in the relative response factors were correlated with several analyte physicochemical parameters, chosen based on current understanding of ion release from charged droplets during the ESI process. These include analyte properties: nonpolar surface area; polar surface area; gas-phase basicity; proton affinity; and Log D. Multivariate statistical analysis using multiple linear regression, decision tree, and support vector regression models were investigated to assess their potential for predicting ESI response based on the analyte properties. The support vector regression model was more versatile and produced the least predictive error following 12-fold cross-validation. The effect of variation in solution pH on the relative response factors is highlighted, as evidenced by the different predictive models obtained for peptide response at two pH values (pH = 6.0 and 9.0). The relationship between physicochemical parameters and associated ionization efficiencies for GXG tripeptides is discussed based on the equilibrium partitioning model. Copyright 2009 John Wiley & Sons, Ltd.
Heidema, A.G.; Thissen, U.; Boer, J.M.; Bouwman, F.G.; Feskens, E.J.M.; Mariman, E.C.
2009-01-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch
Almeida, Tiago P.; Chu, Gavin S.; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H.; Stafford, Peter J.; Schlindwein, Fernando S.; Ng, G. André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination (P PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information. PMID:28883795
Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P multivariate statistical models were effective in their discrimination (P multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems
Timmis, Jon; Qwarnstrom, Eva E.
2016-01-01
Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model. PMID:27571414
Statistical Techniques Complement UML When Developing Domain Models of Complex Dynamical Biosystems.
Williams, Richard A; Timmis, Jon; Qwarnstrom, Eva E
2016-01-01
Computational modelling and simulation is increasingly being used to complement traditional wet-lab techniques when investigating the mechanistic behaviours of complex biological systems. In order to ensure computational models are fit for purpose, it is essential that the abstracted view of biology captured in the computational model, is clearly and unambiguously defined within a conceptual model of the biological domain (a domain model), that acts to accurately represent the biological system and to document the functional requirements for the resultant computational model. We present a domain model of the IL-1 stimulated NF-κB signalling pathway, which unambiguously defines the spatial, temporal and stochastic requirements for our future computational model. Through the development of this model, we observe that, in isolation, UML is not sufficient for the purpose of creating a domain model, and that a number of descriptive and multivariate statistical techniques provide complementary perspectives, in particular when modelling the heterogeneity of dynamics at the single-cell level. We believe this approach of using UML to define the structure and interactions within a complex system, along with statistics to define the stochastic and dynamic nature of complex systems, is crucial for ensuring that conceptual models of complex dynamical biosystems, which are developed using UML, are fit for purpose, and unambiguously define the functional requirements for the resultant computational model.
Hague, D. S.; Vanderberg, J. D.; Woodbury, N. W.
1974-01-01
A method for rapidly examining the probable applicability of weight estimating formulae to a specific aerospace vehicle design is presented. The Multivariate Analysis Retrieval and Storage System (MARS) is comprised of three computer programs which sequentially operate on the weight and geometry characteristics of past aerospace vehicles designs. Weight and geometric characteristics are stored in a set of data bases which are fully computerized. Additional data bases are readily added to the MARS system and/or the existing data bases may be easily expanded to include additional vehicles or vehicle characteristics.
Directory of Open Access Journals (Sweden)
Armin Saed-Moucheshi
2013-01-01
Full Text Available Multivariate statistical techniques were used to compare the relationship between yield and its related traits under noninoculated and inoculated cultivars with mycorrhizal fungus (Glomus intraradices; each one consisted of three wheat cultivars and four water regimes. Results showed that, under inoculation conditions, spike weight per plant and total chlorophyll content of the flag leaf were the most important variables contributing to wheat grain yield variation, while, under noninoculated condition, in addition to two mentioned traits, grain weight per spike and leaf area were also important variables accounting for wheat grain yield variation. Therefore, spike weight per plant and chlorophyll content of flag leaf can be used as selection criteria in breeding programs for both inoculated and noninoculated wheat cultivars under different water regimes, and also grain weight per spike and leaf area can be considered for noninoculated condition. Furthermore, inoculation of wheat cultivars showed higher value in the most measured traits, and the results indicated that inoculation treatment could change the relationship among morphological traits of wheat cultivars under drought stress. Also, it seems that the results of stepwise regression as a selecting method together with principal component and factor analysis are stronger methods to be applied in breeding programs for screening important traits.
Marković, Snežana; Kerč, Janez; Horvat, Matej
2017-03-01
We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W
2013-02-01
Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.
Energy Technology Data Exchange (ETDEWEB)
Bakraji, E.H., E-mail: cscientificl@aec.org.sy [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Rihawy, M.S. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Castel, C. [CNRS – Maison de l’Orient et de la Méditerranée, Laboratoire “Archéorient”, CNRS/Université Lumière-Lyon 2 (France); Abboud, R. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic)
2015-03-15
Highlights: •PIXE and OSL methods were used to classify and date pottery from Tell Al-Rawda site. •Three groups were classified using PIXE, which suggest different sources of the clay. •OSL was used for dating the site and the date found was consistent with typology. -- Abstract: Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year.
Directory of Open Access Journals (Sweden)
M. SureshGandhi
2014-01-01
Full Text Available The distribution of natural gamma ray emitting 238U, 232Th and 40K radionuclides in beach sediments along north east coast of Tamilnadu, India has been carried out using a NaI(Tl gamma ray spectrometric technique. The total average concentrations of radionuclides 238U, 232Th, and 40K were 35.12, 713.16, and 349.60 Bq kg−1, respectively. Correlations made among these radionuclides prove the existence of secular equilibrium in the investigated sediments. The total average absorbed dose rate in the study areas is found to be 504.75 nGyh−1, whereas the annual effective dose rate has an average value of 0.62 mSvy−1. The mean activity concentrations of measured radionuclides were compared with other literature values. The ratios between the detected radioisotopes have been calculated for spatial distribution of natural radionuclides in studied area. Also the radiological hazard of the natural radionuclides content, radium equivalent activity, external hazard index of the sediment samples in the area under consideration were calculated. Multivariate Statistical analyses (Pearson Correlation, Cluster and Factor analysis were carried out between the parameters obtained from radioactivity to know the existing relations.
Metrology Optical Power Budgeting in SIM Using Statistical Analysis Techniques
Kuan, Gary M
2008-01-01
The Space Interferometry Mission (SIM) is a space-based stellar interferometry instrument, consisting of up to three interferometers, which will be capable of micro-arc second resolution. Alignment knowledge of the three interferometer baselines requires a three-dimensional, 14-leg truss with each leg being monitored by an external metrology gauge. In addition, each of the three interferometers requires an internal metrology gauge to monitor the optical path length differences between the two sides. Both external and internal metrology gauges are interferometry based, operating at a wavelength of 1319 nanometers. Each gauge has fiber inputs delivering measurement and local oscillator (LO) power, split into probe-LO and reference-LO beam pairs. These beams experience power loss due to a variety of mechanisms including, but not restricted to, design efficiency, material attenuation, element misalignment, diffraction, and coupling efficiency. Since the attenuation due to these sources may degrade over time, an accounting of the range of expected attenuation is needed so an optical power margin can be book kept. A method of statistical optical power analysis and budgeting, based on a technique developed for deep space RF telecommunications, is described in this paper and provides a numerical confidence level for having sufficient optical power relative to mission metrology performance requirements.
The use of statistical techniques in par-level management.
Klee, W B
1994-02-01
The total quality management movement has allowed the reintroduction of statistics in the materials management workplace. Statistical methods can be applied to the par level management process with significant results.
Energy Technology Data Exchange (ETDEWEB)
Chen, Hao [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Lu, Xinwei, E-mail: luxinwei@snnu.edu.cn [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Li, Loretta Y., E-mail: lli@civil.ubc.ca [Department of Civil Engineering, University of British Columbia, Vancouver V6T 1Z4 (Canada); Gao, Tianning; Chang, Yuyu [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China)
2014-06-01
The concentrations of As, Ba, Co, Cr, Cu, Mn, Ni, Pb, V and Zn in campus dust from kindergartens, elementary schools, middle schools and universities of Xi'an, China were determined by X-ray fluorescence spectrometry. Correlation coefficient analysis, principal component analysis (PCA) and cluster analysis (CA) were used to analyze the data and to identify possible sources of these metals in the dust. The spatial distributions of metals in urban dust of Xi'an were analyzed based on the metal concentrations in campus dusts using the geostatistics method. The results indicate that dust samples from campuses have elevated metal concentrations, especially for Pb, Zn, Co, Cu, Cr and Ba, with the mean values of 7.1, 5.6, 3.7, 2.9, 2.5 and 1.9 times the background values for Shaanxi soil, respectively. The enrichment factor results indicate that Mn, Ni, V, As and Ba in the campus dust were deficiently to minimally enriched, mainly affected by nature and partly by anthropogenic sources, while Co, Cr, Cu, Pb and Zn in the campus dust and especially Pb and Zn were mostly affected by human activities. As and Cu, Mn and Ni, Ba and V, and Pb and Zn had similar distribution patterns. The southwest high-tech industrial area and south commercial and residential areas have relatively high levels of most metals. Three main sources were identified based on correlation coefficient analysis, PCA, CA, as well as spatial distribution characteristics. As, Ni, Cu, Mn, Pb, Zn and Cr have mixed sources — nature, traffic, as well as fossil fuel combustion and weathering of materials. Ba and V are mainly derived from nature, but partly also from industrial emissions, as well as construction sources, while Co principally originates from construction. - Highlights: • Metal content in dust from schools was determined by XRF. • Spatial distribution of metals in urban dust was focused on campus samples. • Multivariate statistic and spatial distribution were used to identify metal
Energy Technology Data Exchange (ETDEWEB)
Palit, Mousumi [Department of Electronics and Telecommunication Engineering, Central Calcutta Polytechnic, Kolkata 700014 (India); Tudu, Bipan, E-mail: bt@iee.jusl.ac.in [Department of Instrumentation and Electronics Engineering, Jadavpur University, Kolkata 700098 (India); Bhattacharyya, Nabarun [Centre for Development of Advanced Computing, Kolkata 700091 (India); Dutta, Ankur; Dutta, Pallab Kumar [Department of Instrumentation and Electronics Engineering, Jadavpur University, Kolkata 700098 (India); Jana, Arun [Centre for Development of Advanced Computing, Kolkata 700091 (India); Bandyopadhyay, Rajib [Department of Instrumentation and Electronics Engineering, Jadavpur University, Kolkata 700098 (India); Chatterjee, Anutosh [Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata 700107 (India)
2010-08-18
In an electronic tongue, preprocessing on raw data precedes pattern analysis and choice of the appropriate preprocessing technique is crucial for the performance of the pattern classifier. While attempting to classify different grades of black tea using a voltammetric electronic tongue, different preprocessing techniques have been explored and a comparison of their performances is presented in this paper. The preprocessing techniques are compared first by a quantitative measurement of separability followed by principle component analysis; and then two different supervised pattern recognition models based on neural networks are used to evaluate the performance of the preprocessing techniques.
Palit, Mousumi; Tudu, Bipan; Bhattacharyya, Nabarun; Dutta, Ankur; Dutta, Pallab Kumar; Jana, Arun; Bandyopadhyay, Rajib; Chatterjee, Anutosh
2010-08-18
In an electronic tongue, preprocessing on raw data precedes pattern analysis and choice of the appropriate preprocessing technique is crucial for the performance of the pattern classifier. While attempting to classify different grades of black tea using a voltammetric electronic tongue, different preprocessing techniques have been explored and a comparison of their performances is presented in this paper. The preprocessing techniques are compared first by a quantitative measurement of separability followed by principle component analysis; and then two different supervised pattern recognition models based on neural networks are used to evaluate the performance of the preprocessing techniques.
Truu, Jaak; Heinaru, Eeva; Talpsep, Ene; Heinaru, Ain
2002-01-01
The oil-shale industry has created serious pollution problems in northeastern Estonia. Untreated, phenol-rich leachate from semi-coke mounds formed as a by-product of oil-shale processing is discharged into the Baltic Sea via channels and rivers. An exploratory analysis of water chemical and microbiological data sets from the low-flow period was carried out using different multivariate analysis techniques. Principal component analysis allowed us to distinguish different locations in the river system. The riverine microbial community response to water chemical parameters was assessed by co-inertia analysis. Water pH, COD and total nitrogen were negatively related to the number of biodegradative bacteria, while oxygen concentration promoted the abundance of these bacteria. The results demonstrate the utility of multivariate statistical techniques as tools for estimating the magnitude and extent of pollution based on river water chemical and microbiological parameters. An evaluation of river chemical and microbiological data suggests that the ambient natural attenuation mechanisms only partly eliminate pollutants from river water, and that a sufficient reduction of more recalcitrant compounds could be achieved through the reduction of wastewater discharge from the oil-shale chemical industry into the rivers.
Corvucci, Francesca; Nobili, Lara; Melucci, Dora; Grillenzoni, Francesca-Vittoria
2015-02-15
Honey traceability to food quality is required by consumers and food control institutions. Melissopalynologists traditionally use percentages of nectariferous pollens to discriminate the botanical origin and the entire pollen spectrum (presence/absence, type and quantities and association of some pollen types) to determinate the geographical origin of honeys. To improve melissopalynological routine analysis, principal components analysis (PCA) was used. A remarkable and innovative result was that the most significant pollens for the traditional discrimination of the botanical and geographical origin of honeys were the same as those individuated with the chemometric model. The reliability of assignments of samples to honey classes was estimated through explained variance (85%). This confirms that the chemometric model properly describes the melissopalynological data. With the aim to improve honey discrimination, FT-microRaman spectrography and multivariate analysis were also applied. Well performing PCA models and good agreement with known classes were achieved. Encouraging results were obtained for botanical discrimination.
Khoshayand, Mohammad Reza; Abdollahi, Hamid; Moeini, Ali; Shamsaie, Ali; Ghaffari, Alireza; Abbasian, Sepideh
2010-09-01
Three multivariate modelling approaches including partial least squares regression (PLS), genetic algorithm-partial least squares regression (GA-PLS), and principal components-artificial neural network (PC-ANN) analysis were investigated for their application to the simultaneous determination of chlordiazepoxide and clidinium levels in pharmaceuticals. A set of synthetic mixtures of drugs in ethanol and 0.1 M HCL was made, and the prediction abilities of the aforementioned methods were examined using RSE% (relative standard error of the prediction). The PLS and PC-ANN methods were found to be comparable, and GA-PLS produced slightly better results. The predictive models that we built were successfully applied to simultaneously determine the levels of chlordiazepoxide and clidinium in coated tablets.
Tan, Zhiyuan; Jamdagni, Aruna; He, Xiangjian; Nanda, Priyadarsi; Liu, Ren Ping; Qing, Sihan; Susilo, Willy; Wang, Guilin; Liu, Dongmei
2011-01-01
The quality of feature has significant impact on the performance of detection techniques used for Denial-of-Service (DoS) attack. The features that fail to provide accurate characterization for network traffic records make the techniques suffer from low accuracy in detection. Although researches hav
Directory of Open Access Journals (Sweden)
Evans Corey J
2006-10-01
Full Text Available Abstract Background Three-dimensional (3D multivariate Fourier Transform Infrared (FTIR image maps of tissue sections are presented. A villoglandular adenocarcinoma from a cervical biopsy with a number of interesting anatomical features was used as a model system to demonstrate the efficacy of the technique. Methods Four FTIR images recorded using a focal plane array detector of adjacent tissue sections were stitched together using a MATLAB® routine and placed in a single data matrix for multivariate analysis using Cytospec™. Unsupervised Hierarchical Cluster Analysis (UHCA was performed simultaneously on all 4 sections and 4 clusters plotted. The four UHCA maps were then stacked together and interpolated with a box function using SCIRun software. Results The resultant 3D-images can be rotated in three-dimensions, sliced and made semi-transparent to view the internal structure of the tissue block. A number of anatomical and histopathological features including connective tissue, red blood cells, inflammatory exudate and glandular cells could be identified in the cluster maps and correlated with Hematoxylin & Eosin stained sections. The mean extracted spectra from individual clusters provide macromolecular information on tissue components. Conclusion 3D-multivariate imaging provides a new avenue to study the shape and penetration of important anatomical and histopathological features based on the underlying macromolecular chemistry and therefore has clear potential in biology and medicine.
New Statistical Techniques in the Measurement of the inclusive Top Pair Production Cross Section
Franc, Jiří; Štěpánek, Michal; Kůs, Václav
2014-01-01
We present several different types of multivariate statistical techniques used in the measurement of the inclusive top pair production cross section in $p \\bar{p}$-collisions at $\\sqrt{s} = 1.96 \\text{TeV}$ employing the full RunII data ($9.7\\textrm{fb}^{-1}$) collected with the D0 detector at the Fermilab Tevatron Collider. We consider the final state of the top quark pair decays containing one electron or muon and at least two jets. We proceed various statistical homogeneity tests such as Anderson - Darling, Kolmogorov - Smirnov, and $\\varphi$-divergences tests to determine, which variables have good data-MC agreement, as well as a good separation power. We adjusted all tests for using weighted empirical distribution functions. Further we separate $t\\bar{t}$ signal from the background by the application of Generalized Linear Models, Gaussian Mixture Models), Neural Networks with Switching Units and confront them with familiar methods from ROOT TMVA package such as Boosted Decision Trees, and Multi-layer Per...
DEFF Research Database (Denmark)
Birch, Thomas; Martinón-Torres, Marcos
2015-01-01
An assemblage of post-medieval iron bars was found with the Princes Channel wreck, salvaged from the Thames Estuary in 2003. They were recorded and studied, with a focus on metallography and slag inclusion analysis. The investigation provided an opportunity to explore the use of multivariate stat...
Directory of Open Access Journals (Sweden)
Jonas eKaplan
2015-03-01
Full Text Available Here we highlight an emerging trend in the use of machine learning classifiers to test for abstraction across patterns of neural activity. When a classifier algorithm is trained on data from one cognitive context, and tested on data from another, conclusions can be drawn about the role of a given brain region in representing information that abstracts across those cognitive contexts. We call this kind of analysis Multivariate Cross-Classification (MVCC, and review several domains where it has recently made an impact. MVCC has been important in establishing correspondences among neural patterns across cognitive domains, including motor-perception matching and cross-sensory matching. It has been used to test for similarity between neural patterns evoked by perception and those generated from memory. Other work has used MVCC to investigate the similarity of representations for semantic categories across different kinds of stimulus presentation, and in the presence of different cognitive demands. We use these examples to demonstrate the power of MVCC as a tool for investigating neural abstraction and discuss some important methodological issues related to its application.
Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Ferrão, Marco Flores; da Silva, Fabiana Ernestina Barcellos; Müller, Edson Irineu; Flores, Erico Marlon de Moraes
2012-06-01
A method for simultaneous determination of clavulanic acid (CA) and amoxicillin (AMO) in commercial tablets was developed using diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) and multivariate calibration. Twenty-five samples (10 commercial and 15 synthetic) were used as a calibration set and 15 samples (10 commercial and 5 synthetic) were used for a prediction set. Calibration models were developed using partial least squares (PLS), interval PLS (iPLS), and synergy interval PLS (siPLS) algorithms. The best algorithm for CA determination was siPLS model with spectra divided in 30 intervals and combinations of 2 intervals. This model showed a root mean square error of prediction (RMSEP) of 5.1 mg g(-1). For AMO determination, the best siPLS model was obtained with spectra divided in 10 intervals and combinations of 4 intervals. This model showed a RMSEP of 22.3 mg g(-1). The proposed method was considered as a suitable for the simultaneous determination of CA and AMO in commercial pharmaceuticals products.
Energy Technology Data Exchange (ETDEWEB)
Heyen, H. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Gewaesserphysik
1998-12-31
A multivariate statistical approach is presented that allows a systematic search for relationships between the interannual variability in climate records and ecological time series. Statistical models are built between climatological predictor fields and the variables of interest. Relationships are sought on different temporal scales and for different seasons and time lags. The possibilities and limitations of this approach are discussed in four case studies dealing with salinity in the German Bight, abundance of zooplankton at Helgoland Roads, macrofauna communities off Norderney and the arrival of migratory birds on Helgoland. (orig.) [Deutsch] Ein statistisches, multivariates Modell wird vorgestellt, das eine systematische Suche nach potentiellen Zusammenhaengen zwischen Variabilitaet in Klima- und oekologischen Zeitserien erlaubt. Anhand von vier Anwendungsbeispielen wird der Klimaeinfluss auf den Salzgehalt in der Deutschen Bucht, Zooplankton vor Helgoland, Makrofauna vor Norderney, und die Ankunft von Zugvoegeln auf Helgoland untersucht. (orig.)
广义次序统计量间隔的多维随机排序%Multivariate Stochastic Orderings of Spacings of Generalized Order Statistics
Institute of Scientific and Technical Information of China (English)
方兆本; 胡太忠; 吴耀华; 庄玮玮
2006-01-01
本文研究了附加于广义次序统计量底分布以及参数的条件,使得人们在多维似然比序和多维通常随机序意义下对广义次序统计量的间隔向量进行比较,同时也给出了文中主要结果的应用.%In this paper, we investigate conditions on the underlying distribution function and the parameters on which the generalized order statistics are based, to obtain stochastic comparisons of spacing vectors of generalized order statistics in the multivariate likelihood ratio and the usual multivariate stochastic orders. Some applications of the main results are also given.
Directory of Open Access Journals (Sweden)
Tiago P. Almeida
2017-08-01
Full Text Available Purpose: Complex fractionated atrial electrograms (CFAE-guided ablation after pulmonary vein isolation (PVI has been used for persistent atrial fibrillation (persAF therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model.Methods: 207 pairs of atrial electrograms (AEGs were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA and linear discriminant analysis (LDA have been used to characterize the atrial regions and AEGs.Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001. Four types of LA regions were identified, based on the AEGs characteristics: (i fractionated before PVI that remained fractionated after PVI (31% of the collected points; (ii fractionated that converted to normal (39%; (iii normal prior to PVI that became fractionated (9% and; (iv normal that remained normal (21%. Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination (P < 0.0001.Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Institute of Scientific and Technical Information of China (English)
梁军; 钱积新
2003-01-01
Multivariate statistical process monitoring and control (MSPM&C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares (PLS) are surveyed in this paper. The four-step procedure of performing MSPM&C for chemical process, modeling of processes, detecting abnormal events or faults, identifying the variable(s) responsible for the faults and diagnosing the source cause for the abnormal behavior, is analyzed. Several main research directions of MSPM&C reported in the literature are discussed, such as multi-way principal component analysis (MPCA) for batch process, statistical monitoring and control for nonlinear process, dynamic PCA and dynamic PLS, and on-line quality control by inferential models. Industrial applications of MSPM&C to several typical chemical processes, such as chemical reactor, distillation column, polymerization process, petroleum refinery units, are summarized. Finally, some concluding remarks and future considerations are made.
Ham, L.V. van der; Bakker, D.E.; Geers, L.F.G.; Goetheer, E.L.V.
2014-01-01
The solvent and the dissolved CO2 concentrations are two essential properties of CO2 absorption processes. Currently, they are typically monitored using time-consuming offline analytical techniques. Initial development efforts aiming at a cost-effective and reliable inline monitoring system are desc
New techniques for the scientific visualization of three-dimensional multi-variate and vector fields
Energy Technology Data Exchange (ETDEWEB)
Crawfis, Roger A. [Univ. of California, Davis, CA (United States)
1995-10-01
Volume rendering allows us to represent a density cloud with ideal properties (single scattering, no self-shadowing, etc.). Scientific visualization utilizes this technique by mapping an abstract variable or property in a computer simulation to a synthetic density cloud. This thesis extends volume rendering from its limitation of isotropic density clouds to anisotropic and/or noisy density clouds. Design aspects of these techniques are discussed that aid in the comprehension of scientific information. Anisotropic volume rendering is used to represent vector based quantities in scientific visualization. Velocity and vorticity in a fluid flow, electric and magnetic waves in an electromagnetic simulation, and blood flow within the body are examples of vector based information within a computer simulation or gathered from instrumentation. Understand these fields can be crucial to understanding the overall physics or physiology. Three techniques for representing three-dimensional vector fields are presented: Line Bundles, Textured Splats and Hair Splats. These techniques are aimed at providing a high-level (qualitative) overview of the flows, offering the user a substantial amount of information with a single image or animation. Non-homogenous volume rendering is used to represent multiple variables. Computer simulations can typically have over thirty variables, which describe properties whose understanding are useful to the scientist. Trying to understand each of these separately can be time consuming. Trying to understand any cause and effect relationships between different variables can be impossible. NoiseSplats is introduced to represent two or more properties in a single volume rendering of the data. This technique is also aimed at providing a qualitative overview of the flows.
Indian Academy of Sciences (India)
Işık Yilmaz; Marian Marschalko; Martin Bednarik
2013-04-01
The paper presented herein compares and discusses the use of bivariate, multivariate and soft computing techniques for collapse susceptibility modelling. Conditional probability (CP), logistic regression (LR) and artificial neural networks (ANN) models representing the bivariate, multivariate and soft computing techniques were used in GIS based collapse susceptibility mapping in an area from Sivas basin (Turkey). Collapse-related factors, directly or indirectly related to the causes of collapse occurrence, such as distance from faults, slope angle and aspect, topographical elevation, distance from drainage, topographic wetness index (TWI), stream power index (SPI), Normalized Difference Vegetation Index (NDVI) by means of vegetation cover, distance from roads and settlements were used in the collapse susceptibility analyses. In the last stage of the analyses, collapse susceptibility maps were produced from the models, and they were then compared by means of their validations. However, Area Under Curve (AUC) values obtained from all three models showed that the map obtained from soft computing (ANN) model looks like more accurate than the other models, accuracies of all three models can be evaluated relatively similar. The results also showed that the conditional probability is an essential method in preparation of collapse susceptibility map and highly compatible with GIS operating features.
Directory of Open Access Journals (Sweden)
Thomas Lefèvre
Full Text Available Cost containment policies and the need to satisfy patients' health needs and care expectations provide major challenges to healthcare systems. Identification of homogeneous groups in terms of healthcare utilisation could lead to a better understanding of how to adjust healthcare provision to society and patient needs.This study used data from the third wave of the SIRS cohort study, a representative, population-based, socio-epidemiological study set up in 2005 in the Paris metropolitan area, France. The data were analysed using a cross-sectional design. In 2010, 3000 individuals were interviewed in their homes. Non-conventional multivariate clustering techniques were used to determine homogeneous user groups in data. Multinomial models assessed a wide range of potential associations between user characteristics and their pattern of healthcare utilisation.We identified four distinct patterns of healthcare use. Patterns of consumption and the socio-demographic characteristics of users differed qualitatively and quantitatively between these four profiles. Extensive and intensive use by older, wealthier and unhealthier people contrasted with narrow and parsimonious use by younger, socially deprived people and immigrants. Rare, intermittent use by young healthy men contrasted with regular targeted use by healthy and wealthy women.The use of an original technique of massive multivariate analysis allowed us to characterise different types of healthcare users, both in terms of resource utilisation and socio-demographic variables. This method would merit replication in different populations and healthcare systems.
Gaonkar, Bilwaj; Davatzikos, Christos
2013-01-01
Multivariate pattern analysis (MVPA) methods such as support vector machines (SVMs) have been increasingly applied to fMRI and sMRI analyses, enabling the detection of distinctive imaging patterns. However, identifying brain regions that significantly contribute to the classification/group separation requires computationally expensive permutation testing. In this paper we show that the results of SVM-permutation testing can be analytically approximated. This approximation leads to more than a...
Multivariate Time Series Forecasting of Crude Palm Oil Price Using Machine Learning Techniques
Kanchymalay, Kasturi; Salim, N.; Sukprasert, Anupong; Krishnan, Ramesh; Raba'ah Hashim, Ummi
2017-08-01
The aim of this paper was to study the correlation between crude palm oil (CPO) price, selected vegetable oil prices (such as soybean oil, coconut oil, and olive oil, rapeseed oil and sunflower oil), crude oil and the monthly exchange rate. Comparative analysis was then performed on CPO price forecasting results using the machine learning techniques. Monthly CPO prices, selected vegetable oil prices, crude oil prices and monthly exchange rate data from January 1987 to February 2017 were utilized. Preliminary analysis showed a positive and high correlation between the CPO price and soy bean oil price and also between CPO price and crude oil price. Experiments were conducted using multi-layer perception, support vector regression and Holt Winter exponential smoothing techniques. The results were assessed by using criteria of root mean square error (RMSE), means absolute error (MAE), means absolute percentage error (MAPE) and Direction of accuracy (DA). Among these three techniques, support vector regression(SVR) with Sequential minimal optimization (SMO) algorithm showed relatively better results compared to multi-layer perceptron and Holt Winters exponential smoothing method.
TECHNIQUE OF THE STATISTICAL ANALYSIS OF INVESTMENT APPEAL OF THE REGION
Directory of Open Access Journals (Sweden)
А. А. Vershinina
2014-01-01
Full Text Available The technique of the statistical analysis of investment appeal of the region is given in scientific article for direct foreign investments. Definition of a technique of the statistical analysis is given, analysis stages reveal, the mathematico-statistical tools are considered.
Kamtchueng, Brice T; Fantong, Wilson Y; Wirmvem, Mengnjo J; Tiodjio, Rosine E; Takounjou, Alain F; Ndam Ngoupayou, Jules R; Kusakabe, Minoru; Zhang, Jing; Ohba, Takeshi; Tanyileke, Gregory; Hell, Joseph V; Ueda, Akira
2016-09-01
With the use of conventional hydrogeochemical techniques, multivariate statistical analysis, and stable isotope approaches, this paper investigates for the first time surface water and groundwater from the surrounding areas of Lake Monoun (LM), West Cameroon. The results reveal that waters are generally slightly acidic to neutral. The relative abundance of major dissolved species are Ca(2+) > Mg(2+) > Na(+) > K(+) for cations and HCO3 (-) ≫ NO3 (-) > Cl(-) > SO4 (2-) for anions. The main water type is Ca-Mg-HCO3. Observed salinity is related to water-rock interaction, ion exchange process, and anthropogenic activities. Nitrate and chloride have been identified as the most common pollutants. These pollutants are attributed to the chlorination of wells and leaching from pit latrines and refuse dumps. The stable isotopic compositions in the investigated water sources suggest evidence of evaporation before recharge. Four major groups of waters were identified by salinity and NO3 concentrations using the Q-mode hierarchical cluster analysis (HCA). Consistent with the isotopic results, group 1 represents fresh unpolluted water occurring near the recharge zone in the general flow regime; groups 2 and 3 are mixed water whose composition is controlled by both weathering of rock-forming minerals and anthropogenic activities; group 4 represents water under high vulnerability of anthropogenic pollution. Moreover, the isotopic results and the HCA showed that the CO2-rich bottom water of LM belongs to an isolated hydrological system within the Foumbot plain. Except for some springs, groundwater water in the area is inappropriate for drinking and domestic purposes but good to excellent for irrigation.
Statistical tools for the calibration of traffic conflicts techniques.
Oppe, S.
1982-01-01
To compare the results of various conflict techniques from different countries, an international experiment took place in Rouen in 1979. The experiment showed that, in general, with each technique the same conclusions were reached with regard to the problems of safety at two intersections in Rouen.
Sharma, Sandeep; Goodarzi, Mohammad; Ramon, Herman; Saeys, Wouter
2014-04-01
Partial Least Squares (PLS) regression is one of the most used methods for extracting chemical information from Near Infrared (NIR) spectroscopic measurements. The success of a PLS calibration relies largely on the representativeness of the calibration data set. This is not trivial, because not only the expected variation in the analyte of interest, but also the variation of other contributing factors (interferents) should be included in the calibration data. This also implies that changes in interferent concentrations not covered in the calibration step can deteriorate the prediction ability of the calibration model. Several researchers have suggested that PLS models can be robustified against changes in the interferent structure by incorporating expert knowledge in the preprocessing step with the aim to efficiently filter out the spectral influence of the spectral interferents. However, these methods have not yet been compared against each other. Therefore, in the present study, various preprocessing techniques exploiting expert knowledge were compared on two experimental data sets. In both data sets, the calibration and test set were designed to have a different interferent concentration range. The performance of these techniques was compared to that of preprocessing techniques which do not use any expert knowledge. Using expert knowledge was found to improve the prediction performance for both data sets. For data set-1, the prediction error improved nearly 32% when pure component spectra of the analyte and the interferents were used in the Extended Multiplicative Signal Correction framework. Similarly, for data set-2, nearly 63% improvement in the prediction error was observed when the interferent information was utilized in Spectral Interferent Subtraction preprocessing.
Correlation techniques and measurements of wave-height statistics
Guthart, H.; Taylor, W. C.; Graf, K. A.; Douglas, D. G.
1972-01-01
Statistical measurements of wave height fluctuations have been made in a wind wave tank. The power spectral density function of temporal wave height fluctuations evidenced second-harmonic components and an f to the minus 5th power law decay beyond the second harmonic. The observations of second harmonic effects agreed very well with a theoretical prediction. From the wave statistics, surface drift currents were inferred and compared to experimental measurements with satisfactory agreement. Measurements were made of the two dimensional correlation coefficient at 15 deg increments in angle with respect to the wind vector. An estimate of the two-dimensional spatial power spectral density function was also made.
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…
Statistical techniques for sampling and monitoring natural resources
Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado
2004-01-01
We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....
Velocity field statistics and tessellation techniques : Unbiased estimators of Omega
Van de Weygaert, R; Bernardeau, F; Muller,; Gottlober, S; Mucket, JP; Wambsganss, J
1998-01-01
We describe two new - stochastic-geometrical - methods to obtain reliable velocity field statistics from N-body simulations and from any general density and velocity fluctuation field sampled at a discrete set of locations. These methods, the Voronoi tessellation method and Delaunay tessellation met
Sensitivity analysis and related analysis : A survey of statistical techniques
Kleijnen, J.P.C.
1995-01-01
This paper reviews the state of the art in five related types of analysis, namely (i) sensitivity or what-if analysis, (ii) uncertainty or risk analysis, (iii) screening, (iv) validation, and (v) optimization. The main question is: when should which type of analysis be applied; which statistical
Gaonkar, Bilwaj; Davatzikos, Christos
2013-09-01
Multivariate pattern analysis (MVPA) methods such as support vector machines (SVMs) have been increasingly applied to fMRI and sMRI analyses, enabling the detection of distinctive imaging patterns. However, identifying brain regions that significantly contribute to the classification/group separation requires computationally expensive permutation testing. In this paper we show that the results of SVM-permutation testing can be analytically approximated. This approximation leads to more than a thousandfold speedup of the permutation testing procedure, thereby rendering it feasible to perform such tests on standard computers. The speedup achieved makes SVM based group difference analysis competitive with standard univariate group difference analysis methods.
Basics of Multivariate Analysis in Neuroimaging Data
Habeck, Christian Georg
2010-01-01
Multivariate analysis techniques for neuroimaging data have recently received increasing attention as they have many attractive features that cannot be easily realized by the more commonly used univariate, voxel-wise, techniques1,5,6,7,8,9. Multivariate approaches evaluate correlation/covariance of activation across brain regions, rather than proceeding on a voxel-by-voxel basis. Thus, their results can be more easily interpreted as a signature of neural networks. Univariate approaches, on the other hand, cannot directly address interregional correlation in the brain. Multivariate approaches can also result in greater statistical power when compared with univariate techniques, which are forced to employ very stringent corrections for voxel-wise multiple comparisons. Further, multivariate techniques also lend themselves much better to prospective application of results from the analysis of one dataset to entirely new datasets. Multivariate techniques are thus well placed to provide information about mean differences and correlations with behavior, similarly to univariate approaches, with potentially greater statistical power and better reproducibility checks. In contrast to these advantages is the high barrier of entry to the use of multivariate approaches, preventing more widespread application in the community. To the neuroscientist becoming familiar with multivariate analysis techniques, an initial survey of the field might present a bewildering variety of approaches that, although algorithmically similar, are presented with different emphases, typically by people with mathematics backgrounds. We believe that multivariate analysis techniques have sufficient potential to warrant better dissemination. Researchers should be able to employ them in an informed and accessible manner. The current article is an attempt at a didactic introduction of multivariate techniques for the novice. A conceptual introduction is followed with a very simple application to a diagnostic
A Space-Filling Visualization Technique for Multivariate Small World Graphs
Energy Technology Data Exchange (ETDEWEB)
Wong, Pak C.; Foote, Harlan P.; Mackey, Patrick S.; Chin, George; Huang, Zhenyu; Thomas, James J.
2012-03-15
We introduce an information visualization technique, known as GreenCurve, for large sparse graphs that exhibit small world properties. Our fractal-based design approach uses spatial cues to approximate the node connections and thus eliminates the links between the nodes in the visualization. The paper describes a sophisticated algorithm to order the neighboring nodes of a large sparse graph by solving the Fiedler vector of its graph Laplacian, and then fold the graph nodes into a space-filling fractal curve based on the Fiedler vector. The result is a highly compact visualization that gives a succinct overview of the graph with guaranteed visibility of every graph node. We show in the paper that the GreenCurve technology is (1) theoretically sustainable by introducing an error estimation metric to measure the fidelity of the new graph representation, (2) empirically rigorous by conducting a usability study to investigate its strengths and weaknesses against the traditional graph layout, and (3) pragmatically feasible by applying it to analyze stressed conditions of the large scale electric power grid on the west coast.
Directory of Open Access Journals (Sweden)
Amin Hossein Morshedy
2017-07-01
Full Text Available Introduction Nowadays, exploration of rare earth element (REE resources is considered as one of the strategic priorities, which has a special position in the advanced and intelligent industries (Castor and Hedrick, 2006. Significant resources of REEs are found in a wide range of geological settings, including primary deposits associated with igneous and hydrothermal processes (e.g. carbonatite, (per alkaline-igneous rocks, iron-oxide breccia complexes, scarns, fluorapatite veins and pegmatites, and secondary deposits concentrated by sedimentary processes and weathering (e.g. heavy-mineral sand deposits, fluviatile sandstones, unconformity-related uranium deposits, and lignites (Jaireth et al., 2014. Recent studies on various parts of Iran led to the identification of promising potential of these elements, including Central Iran, alkaline rocks in the Eslami Peninsula, iron and apatite in the Hormuz Island, Kahnouj titanium deposit, granitoid bodies in Yazd, Azerbaijan, and Mashhad and associated dikes, and finally placers related to the Shemshak formation in Marvast, Kharanagh, and Ardekan indicate high concentration of REE in magmatogenic iron–apatite deposits in Central Iran and placers in Marvast area in Yazd (Ghorbani, 2013. Materials and methods In the present study, the geochemical behavior of rare earth elements is modeled by using multivariate statistical methods in the eastern part of the Marvast placer. Marvast is located 185 km south of the city of Yazd in central Iran between Yazd and Mehriz. This area lies within the southeastern part of the Sanandaj-Sirjan Zone (Alipour-Asll et al., 2012. The samples of 53 wells were analyzed for Whole-rock trace-element concentrations (including REE by inductively coupled plasma-mass spectrometry (ICP-MS (GSI, 2004. The clustering techniques such as multivariate statistical analysis technique can be employed to find appropriate groups in data sets. One of the main objectives of data clustering
Adaptive Steganography: A survey of Recent Statistical Aware Steganography Techniques
Directory of Open Access Journals (Sweden)
Manish Mahajan
2012-09-01
Full Text Available Steganography is the science that deals with hiding of secret data in some carrier media which may be image, audio, formatted text or video. The main idea behind this is to conceal the very existence of data. We will be dealing here with image steganography. Many algorithms have been proposed for this purpose in spatial & frequency domain. But in almost all the algorithms it has been noticed that as we embed the secret data in the image the certain characteristics or statistics of the image get disturbed. Based on these disturbed statistics steganalysts can get the reflection about the existence of secret data which they further decode with the help of available steganalytic tools. Steganalysis is a science of attacking the hidden data to get an authorized access. Although steganalysis is not a part of this work but it may be sometimes discussed as a part of literature. Even in steganography we are not purely concerned with spatial or frequency domain rather our main emphasis is on adaptive steganography or model based steganography. Adaptive steganography is not entirely a new branch of steganography rather it is based upon spatial & frequency domain with an additional layer of mathematical model. So here we will be dealing with adaptive steganography which take care about the important characteristics & statistics of the cover image well in advance to the embedding of secret data so that the disturbance of image statistics as mentioned earlier, which attracts the forgery or unauthorized access, can be minimized. In this survey we will analyze the various steganography algorithms which are based upon certain mathematical model or in other words algorithms which come under the category of model based steganography.
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the noncentral chi-square distribution is justified by statistical theory. Actually, when the null hypothesis is not trivially violated, the noncentral chi-square distribution cannot describe the LR statistic well even when data are normally distributed and the sample size is large. Using the one-dimensional case, this article provides the details showing that the LR statistic asymptotically follows a normal distribution, which also leads to an asymptotically correct confidence interval for the discrepancy between the null hypothesis/model and the population. For each one-dimensional result, the corresponding results in the higher dimensional case are pointed out and references are provided. Examples with real data illustrate the difference between the noncentral chi-square distribution and the normal distribution. Monte Carlo results compare the strength of the normal distribution against that of the noncentral chi-square distribution. The implication to data analysis is discussed whenever relevant. The development is built upon the concepts of basic calculous, linear algebra, and introductory probability and statistics. The aim is to provide the least technical material for quantitative graduate students in social science to understand the condition and limitation of the noncentral chi-square distribution.
Statistical techniques for the characterization of partially observed epidemics.
Energy Technology Data Exchange (ETDEWEB)
Safta, Cosmin; Ray, Jaideep; Crary, David (Applied Research Associates, Inc, Arlington, VA); Cheng, Karen (Applied Research Associates, Inc, Arlington, VA)
2010-11-01
Techniques appear promising to construct and integrate automated detect-and-characterize technique for epidemics - Working off biosurveillance data, and provides information on the particular/ongoing outbreak. Potential use - in crisis management and planning, resource allocation - Parameter estimation capability ideal for providing the input parameters into an agent-based model, Index Cases, Time of Infection, infection rate. Non-communicable diseases are easier than communicable ones - Small anthrax can be characterized well with 7-10 days of data, post-detection; plague takes longer, Large attacks are very easy.
Energy Technology Data Exchange (ETDEWEB)
Mecozzi, M.; Amici, M. [Istituto Centrale per la Ricerca Scientifica e Technological Applicata al Mare, Rome (Italy); Acquistucci, R. [Istituto Nazionale per la Nutrizione e gli Alimenti, Rome (Italy)
2003-07-01
We report a procedure for describing the gas chromatographic retention time of polychlorinated biphenyls (PCBs) as a function of simple mono-dimensional molecular descriptors such as the number and position of chlorine atoms on the aromatic rings. The mathematical relationships between relative retention time (RRT) of all 209 possible congeners of PCBs and the mono-dimensional molecular descriptors (MDDs) were obtained by the multivariate techniques principal component regression (PCR) and partial least squares (PLS) used as modelling tools. The good agreement found between experimental and predicted retention times of PCBs shows that a well established mathematical model relating retention time to specific mono-dimensional molecular descriptors can be a useful tool to enhance identification of these pollutants in real samples. (orig.)
Statistical Theory of the Vector Random Decrement Technique
DEFF Research Database (Denmark)
Asmussen, J. C.; Brincker, Rune; Ibrahim, S. R.
1999-01-01
The Vector Random Decrement technique has previously been introduced as an effcient method to transform ambient responses of linear structures into Vector Random Decrement functions which are equivalent to free decays of the current structure. The modal parameters can be extracted from the free d...
Statistical Theory of the Vector Random Decrement Technique
DEFF Research Database (Denmark)
Asmussen, J. C.; Brincker, Rune; Ibrahim, S. R.
1999-01-01
The Vector Random Decrement technique has previously been introduced as an effcient method to transform ambient responses of linear structures into Vector Random Decrement functions which are equivalent to free decays of the current structure. The modal parameters can be extracted from the free d...
Web Usage Statistics: Measurement Issues and Analytical Techniques.
Bertot, John Carlo; McClure, Charles R.; Moen, William E.; Rubin, Jeffrey
1997-01-01
One means of Web use evaluation is through analysis of server-generated log files. Various log file analysis techniques and issues are presented that are related to the interpretation of log file data. Study findings indicate a number of problems; recommendations and areas needing further research are outlined. (AEF)
Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad
2014-01-24
In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then
Some Bayesian statistical techniques useful in estimating frequency and density
Johnson, D.H.
1977-01-01
This paper presents some elementary applications of Bayesian statistics to problems faced by wildlife biologists. Bayesian confidence limits for frequency of occurrence are shown to be generally superior to classical confidence limits. Population density can be estimated from frequency data if the species is sparsely distributed relative to the size of the sample plot. For other situations, limits are developed based on the normal distribution and prior knowledge that the density is non-negative, which insures that the lower confidence limit is non-negative. Conditions are described under which Bayesian confidence limits are superior to those calculated with classical methods; examples are also given on how prior knowledge of the density can be used to sharpen inferences drawn from a new sample.
Statistical techniques using NURE airborne geophysical data and NURE geochemical data
Campbell, Katherine
Some standard techniques in multivariate analysis are used to describe the relationships among remotely sensed observations (Landsat and airborne geophysical data) and between these variables and hydrogeochemical and stream sediment analyses. Gray-level pictures of such factors make the analytic results more accessible and easier to interpret.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung
2016-01-01
A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy.
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Statistical mechanics of sensing and communications: Insights and techniques
Energy Technology Data Exchange (ETDEWEB)
Murayama, T; Davis, P [NTT Communication Science Laboratories, NIPPON TELEGRAPH AND TELEPHONE CORPORATION, 2-4, Hikaridai, Seika-cho, ' Keihanna Science City' , Kyoto 619-0237 (Japan)], E-mail: murayama@cslab.kecl.ntt.co.jp, E-mail: davis@cslab.kecl.ntt.co.jp
2008-01-15
In this article we review a basic model for analysis of large sensor networks from the point of view of collective estimation under bandwidth constraints. We compare different sensing aggregation levels as alternative 'strategies' for collective estimation: moderate aggregation from a moderate number of sensors for which communication bandwidth is enough that data encoding can be reversible, and large scale aggregation from very many sensors - in which case communication bandwidth constraints require the use of nonreversible encoding. We show the non-trivial trade-off between sensing quality, which can be increased by increasing the number of sensors, and communication quality under bandwidth constraints, which decreases if the number of sensors is too large. From a practical standpoint, we verify that such a trade-off exists in constructively defined communications schemes. We introduce a probabilistic encoding scheme and define rate distortion models that are suitable for analysis of the large network limit. Our description shows that the methods and ideas from statistical physics can play an important role in formulating effective models for such schemes.
Directory of Open Access Journals (Sweden)
Shashank Vyas
2016-01-01
Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.
Liu, Ya-Juan; André, Silvère; Saint Cristau, Lydia; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Devos, Olivier; Duponchel, Ludovic
2017-02-01
Multivariate statistical process control (MSPC) is increasingly popular as the challenge provided by large multivariate datasets from analytical instruments such as Raman spectroscopy for the monitoring of complex cell cultures in the biopharmaceutical industry. However, Raman spectroscopy for in-line monitoring often produces unsynchronized data sets, resulting in time-varying batches. Moreover, unsynchronized data sets are common for cell culture monitoring because spectroscopic measurements are generally recorded in an alternate way, with more than one optical probe parallelly connecting to the same spectrometer. Synchronized batches are prerequisite for the application of multivariate analysis such as multi-way principal component analysis (MPCA) for the MSPC monitoring. Correlation optimized warping (COW) is a popular method for data alignment with satisfactory performance; however, it has never been applied to synchronize acquisition time of spectroscopic datasets in MSPC application before. In this paper we propose, for the first time, to use the method of COW to synchronize batches with varying durations analyzed with Raman spectroscopy. In a second step, we developed MPCA models at different time intervals based on the normal operation condition (NOC) batches synchronized by COW. New batches are finally projected considering the corresponding MPCA model. We monitored the evolution of the batches using two multivariate control charts based on Hotelling's T(2) and Q. As illustrated with results, the MSPC model was able to identify abnormal operation condition including contaminated batches which is of prime importance in cell culture monitoring We proved that Raman-based MSPC monitoring can be used to diagnose batches deviating from the normal condition, with higher efficacy than traditional diagnosis, which would save time and money in the biopharmaceutical industry. Copyright © 2016 Elsevier B.V. All rights reserved.
Design of UWB pulse radio transceiver using statistical correlation technique in frequency domain
Directory of Open Access Journals (Sweden)
M. Anis
2007-06-01
Full Text Available In this paper, we propose a new technique to extract low power UWB pulse radio signals, near to noise level, using statistical correlation technique in frequency domain. The receiver consists of many narrow bandpass filters which extract energy either from transmitted UWB signal, interfering channels or noise. Transmitted UWB data can be eliminated by statistical correlation of multiple bandpass filter outputs. Super-regenerative oscillators, tuned within UWB spectrum, are designed as bandpass filters. Summers and comparators perform statistical correlation.
Statistical and Managerial Techniques for Six Sigma Methodology Theory and Application
Barone, Stefano
2012-01-01
Statistical and Managerial Techniques for Six Sigma Methodology examines the methodology through illustrating the most widespread tool and techniques involved in Six Sigma application. Both managerial and statistical aspects of Six Sigma will be analyzed, allowing the reader to apply these tools in the field. This book offers an insight on variation and risk management, and focuses on the structure and organizational aspects of the Six Sigma projects. It covers six sigma methodology, basic managerial techniques, basic statistical techniques, methods for variation and risk management and advanc
Energy Technology Data Exchange (ETDEWEB)
Papachristodoulou, Christina [Nuclear Physics Laboratory, Department of Physics, University of Ioannina, 451 10 Ioannina (Greece); Oikonomou, Artemios [Composite Materials Laboratory, Department of Materials' Science and Engineering, University of Ioannina, 451 10 Ioannina (Greece); Ioannides, Kostas [Nuclear Physics Laboratory, Department of Physics, University of Ioannina, 451 10 Ioannina (Greece); Gravani, Konstantina [Archaelogy Section, Department of History-Archaeology, University of Ioannina, 451 10 Ioannina (Greece)
2006-07-28
Energy-dispersive X-ray fluorescence spectroscopy was used to determine the composition of 64 potsherds from the Hellenistic settlement of Orraon, in northwestern Greece. Data classification by principal components analysis revealed four distinct groups of pottery, pointing to different local production practices rather than different provenance. The interpretation of statistical grouping was corroborated by a complementary X-ray diffraction analysis. Compositional and mineralogical data, combined with archaeological and materials' science criteria, allowed addressing various aspects of pottery making, such as selection of raw clays, tempers and firing conditions.
Papachristodoulou, Christina; Oikonomou, Artemios; Ioannides, Kostas; Gravani, Konstantina
2006-07-28
Energy-dispersive X-ray fluorescence spectroscopy was used to determine the composition of 64 potsherds from the Hellenistic settlement of Orraon, in northwestern Greece. Data classification by principal components analysis revealed four distinct groups of pottery, pointing to different local production practices rather than different provenance. The interpretation of statistical grouping was corroborated by a complementary X-ray diffraction analysis. Compositional and mineralogical data, combined with archaeological and materials' science criteria, allowed addressing various aspects of pottery making, such as selection of raw clays, tempers and firing conditions.
Monakhova, Yulia B; Diehl, Bernd W K
2015-11-10
(1)H NMR spectroscopy was used to distinguish pure porcine heparin and porcine heparin blended with bovine species and to quantify the degree of such adulteration. For multivariate modelling several statistical methods such as partial least squares regression (PLS), ridge regression (RR), stepwise regression with variable selection (SR), stepwise principal component regression (SPCR) were utilized for modeling NMR data of in-house prepared blends (n=80). The models were exhaustively validated using independent test and prediction sets. PLS and RR showed the best performance for estimating heparin falsification regarding its animal origin with the limit of detection (LOD) and root mean square error of validation (RMSEV) below 2% w/w and 1% w/w, respectively. Reproducibility expressed in coefficients of variation was estimated to be below 10% starting from approximately 5% w/w of bovine adulteration. Acceptable calibration model was obtained by SPCR, by its application range was limited, whereas SR is least recommended for heparin matrix. The developed method was found to be applicable also to heparinoid matrix (not purified heparin). In this case root mean square of prediction (RMSEP) and LOD were approximately 7% w/w and 8% w/w, respectively. The simple and cheap NMR method is recommended for screening of heparin animal origin in parallel with official NMR test of heparin authenticity and purity.
Institute of Scientific and Technical Information of China (English)
Katsuaki Koike
2011-01-01
Sample data in the Earth and environmental sciences are limited in quantity and sampling location and therefore, sophisticated spatial modeling techniques are indispensable for accurate imaging of complicated structures and properties of geomaterials. This paper presents several effective methods that are grouped into two categories depending on the nature of regionalized data used. Type I data originate from plural populations and type II data satisfy the prerequisite of stationarity and have distinct spatial correlations. For the type I data, three methods are shown to be effective and demonstrated to produce plausible results: (1) a spline-based method, (2) a combination of a spline-based method with a stochastic simulation, and (3) a neural network method. Geostatistics proves to be a powerful tool for type II data. Three new approaches of geostatistics are presented with case studies: an application to directional data such as fracture, multi-scale modeling that incorporates a scaling law,and space-time joint analysis for multivariate data. Methods for improving the contribution of such spatial modeling to Earth and environmental sciences are also discussed and future important problems to be solved are summarized.
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a plant material that is of forensic interest due to the hallucinogenic nature of the active ingredient, salvinorin A. In this study, S. divinorum was extracted and spiked onto four different plant materials (S. divinorum, Salvia officinalis, Cannabis sativa, and Nicotiana tabacum) to simulate an adulterated sample that might be encountered in a forensic laboratory. The adulterated samples were extracted and analyzed by gas chromatography-mass spectrometry, and the resulting total ion chromatograms were subjected to a series of pretreatment procedures that were used to minimize non-chemical sources of variance in the data set. The data were then analyzed using principal components analysis (PCA) to investigate association of the adulterated extracts to unadulterated S. divinorum. While association was possible based on visual assessment of the PCA scores plot, additional procedures including Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores to provide a statistical evaluation of the association observed. The advantages and limitations of each statistical procedure in a forensic context were compared and are presented herein.
Pace, Roberto; Martinelli, Ernesto Marco; Sardone, Nicola; D E Combarieu, Eric
2015-03-01
Ginseng is any one of the eleven species belonging to the genus Panax of the family Araliaceae and is found in North America and in eastern Asia. Ginseng is characterized by the presence of ginsenosides. Principally Panax ginseng and Panax quinquefolius are the adaptogenic herbs and are commonly distributed as health food markets. In the present study high performance liquid chromatography has been used to identify and quantify ginsenosides in the two subject species and the different parts of the plant (roots, neck, leaves, flowers, fruits). The power of this chromatographic technique to evaluate the identity of botanical material and to distinguishing different part of the plants has been investigated with metabolomic technique such as principal component analysis. Metabolomics provide a good opportunity for mining useful chemical information from the chromatographic data set resulting an important tool for quality evaluation of medicinal plants in the authenticity, consistency and efficacy. Copyright © 2015 Elsevier B.V. All rights reserved.
Chattopadhyay, Goutami; Jain, Rajni
2009-01-01
In this paper, the complexities in the relationship between rainfall and sea surface temperature (SST) anomalies during the winter monsoon (November-January) over India were evaluated statistically using scatter plot matrices and autocorrelation functions.Linear as well as polynomial trend equations were obtained and it was observed that the coefficient of determination for the linear trend was very low and it remained low even when polynomial trend of degree six was used. An exponential regression equation and an artificial neural network with extensive variable selection were generated to forecast the average winter monsoon rainfall of a given year using the rainfall amounts and the sea surface temperature anomalies in the winter monsoon months of the previous year as predictors. The regression coefficients for the multiple exponential regression equation were generated using Levenberg-Marquardt algorithm. The artificial neural network was generated in the form of a multiplayer perceptron with sigmoid non-l...
Multivariate analysis with LISREL
Jöreskog, Karl G; Y Wallentin, Fan
2016-01-01
This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.
Energy Technology Data Exchange (ETDEWEB)
Glick, D.C.; Davis, A.
1984-07-01
The multivariate statistical techniques of correlation coefficients, factor analysis, and cluster analysis, implemented by computer programs, can be used to process a large data set and produce a summary of relationships between variables and between samples. These techniques were used to find relationships for data on the inorganic constituents of US coals. Three hundred thirty-five whole-seam channel samples from six US coal provinces were analyzed for inorganic variables. After consideration of the attributes of data expressed on ash basis and whole-coal basis, it was decided to perform complete statistical analyses on both data sets. Thirty variables expressed on whole-coal basis and twenty-six variables expressed on ash basis were used. For each inorganic variable, a frequency distribution histogram and a set of summary statistics was produced. These were subdivided to reveal the manner in which concentrations of inorganic constituents vary between coal provinces and between coal regions. Data collected on 124 samples from three stratigraphic groups (Pottsville, Monongahela, Allegheny) in the Appalachian region were studied using analysis of variance to determine degree of variability between stratigraphic levels. Most variables showed differences in mean values between the three groups. 193 references, 71 figures, 54 tables.
Energy Technology Data Exchange (ETDEWEB)
Baig, Jameel A., E-mail: jab_mughal@yahoo.com [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan); Kazi, Tasneem G., E-mail: tgkazi@yahoo.com [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan); Shah, Abdul Q., E-mail: aqshah07@yahoo.com [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan); Arain, Mohammad B. [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan); Afridi, Hassan I., E-mail: hassanimranafridi@yahoo.com [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan); Kandhro, Ghulam A., E-mail: gakandhro@yahoo.com [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan); Khan, Sumaira, E-mail: skhanzai@gmail.com [National Center of Excellence in Analytical Chemistry, University of Sindh, Jamshoro 76080, Sindh (Pakistan)
2009-09-28
The simple and rapid pre-concentration techniques viz. cloud point extraction (CPE) and solid phase extraction (SPE) were applied for the determination of As{sup 3+} and total inorganic arsenic (iAs) in surface and ground water samples. The As{sup 3+} was formed complex with ammonium pyrrolidinedithiocarbamate (APDC) and extracted by surfactant-rich phases in the non-ionic surfactant Triton X-114, after centrifugation the surfactant-rich phase was diluted with 0.1 mol L{sup -1} HNO{sub 3} in methanol. While total iAs in water samples was adsorbed on titanium dioxide (TiO{sub 2}); after centrifugation, the solid phase was prepared to be slurry for determination. The extracted As species were determined by electrothermal atomic absorption spectrometry. The multivariate strategy was applied to estimate the optimum values of experimental factors for the recovery of As{sup 3+} and total iAs by CPE and SPE. The standard addition method was used to validate the optimized methods. The obtained result showed sufficient recoveries for As{sup 3+} and iAs (>98.0%). The concentration factor in both cases was found to be 40.
Chudoba, R.; Sadílek, V.; Rypl, R.; Vořechovský, M.
2013-02-01
This paper examines the feasibility of high-level Python based utilities for numerically intensive applications via an example of a multidimensional integration for the evaluation of the statistical characteristics of a random variable. We discuss the approaches to the implementation of mathematically formulated incremental expressions using high-level scripting code and low-level compiled code. Due to the dynamic typing of the Python language, components of the algorithm can be easily coded in a generic way as algorithmic templates. Using the Enthought Development Suite they can be effectively assembled into a flexible computational framework that can be configured to execute the code for arbitrary combinations of integration schemes and versions of instantiated code. The paper describes the development cycle using a simple running example involving averaging of a random two-parametric function that includes discontinuity. This example is also used to compare the performance of the available algorithmic and executional features. The implemented package including further examples and the results of performance studies have been made available via the free repository [1] and CPCP library. Program summaryProgram title: spirrid Catalogue identifier: AENL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AENL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special licence provided by the author No. of lines in distributed program, including test data, etc.: 10722 No. of bytes in distributed program, including test data, etc.: 157099 Distribution format: tar.gz Programming language: Python and C. Computer: PC. Operating system: LINUX, UNIX, Windows. Classification: 4.13, 6.2. External routines: NumPy (http://numpy.scipy.org/), SciPy (http://www.scipy.com) Nature of problem: Evaluation of the statistical moments of a function of random variables. Solution method: Direct multidimensional
Alba, Vittorio; Bergamini, Carlo; Genghi, Rosalinda; Gasparro, Marica; Perniola, Rocco; Antonacci, Donato
2015-08-01
High estimated heritability values were recently revealed for mature leaf traits in grape (Vitis vinifera L.), thus redeeming ampelography in the era of molecular markers. The "Organisation Internationale de la Vigne et du Vin (OIV)" set a list of hundreds of descriptors for grapevine in order to standardize ampelographic and ampelometric scores. Therefore, the selection and reduction of the number of OIV codes can represent a major goal for leaner biodiversity assessment studies. The identification of ampelometric traits associated with grape diversity allows to construct Classification Trees with chi squared automatic interaction detection (CHAID) algorithm, a stepwise model-fitting method that produces a tree diagram in which at each step the sample pool is splitted based on the independent variables statistically different for the dependent variable. A collection of 100 table and wine grapevines (Vitis vinifera L.) was characterized and evaluated by means of six microsatellites and twenty-two ampelometric traits on mature leaves. Nine ampelometric traits were selected by principal component analysis and employed to build the classification trees based on CHAID algorithm. The strategy can represent an effective tool for grape biodiversity management, right allocations, and identification of new grape genotypes, implemented by a further microsatellite investigation only when unsolved cases occur, allowing faster and cheaper results.
Kauer, Agnes; Dorigo, Wouter; Bauer-Marschallinger, Bernhard
2017-04-01
Global warming is expected to change ocean-atmosphere oscillation patterns, e.g. the El Nino Southern Oscillation, and may thus have a substantial impact on water resources over land. Yet, the link between climate oscillations and terrestrial hydrology has large uncertainties. In particular, the climate in the Mediterranean basin is expected to be sensitive to global warming as it may increase insufficient and irregular water supply and lead to more frequent and intense droughts and heavy precipitation events. The ever increasing need for water in tourism and agriculture reinforce the problem. Therefore, the monitoring and better understanding of the hydrological cycle are crucial for this area. This study seeks to quantify the effect of regional climate modes, e.g. the Northern Atlantic Oscillation (NAO) on the hydrological cycle in the Mediterranean. We apply Empirical Orthogonal Functions (EOF) to a wide range of hydrological datasets to extract the major modes of variation over the study period. We use more than ten datasets describing precipitation, soil moisture, evapotranspiration, and changes in water mass with study periods ranging from one to three decades depending on the dataset. The resulting EOFs are then examined for correlations with regional climate modes using Spearman rank correlation analysis. This is done for the entire time span of the EOFs and for monthly and seasonally sampled data. We find relationships between the hydrological datasets and the climate modes NAO, Arctic Oscillation (AO), Eastern Atlantic (EA), and Tropical Northern Atlantic (TNA). Analyses of monthly and seasonally sampled data reveal high correlations especially in the winter months. However, the spatial extent of the data cube considered for the analyses have a large impact on the results. Our statistical analyses suggest an impact of regional climate modes on the hydrological cycle in the Mediterranean area and may provide valuable input for evaluating process
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for
Riley, P.; Richardson, I. G.
2012-01-01
In-situ measurements of interplanetary coronal mass ejections (ICMEs) display a wide range of properties. A distinct subset, "magnetic clouds" (MCs), are readily identifiable by a smooth rotation in an enhanced magnetic field, together with an unusually low solar wind proton temperature. In this study, we analyze Ulysses spacecraft measurements to systematically investigate five possible explanations for why some ICMEs are observed to be MCs and others are not: i) An observational selection effect; that is, all ICMEs do in fact contain MCs, but the trajectory of the spacecraft through the ICME determines whether the MC is actually encountered; ii) interactions of an erupting flux rope (PR) with itself or between neighboring FRs, which produce complex structures in which the coherent magnetic structure has been destroyed; iii) an evolutionary process, such as relaxation to a low plasma-beta state that leads to the formation of an MC; iv) the existence of two (or more) intrinsic initiation mechanisms, some of which produce MCs and some that do not; or v) MCs are just an easily identifiable limit in an otherwise corntinuous spectrum of structures. We apply quantitative statistical models to assess these ideas. In particular, we use the Akaike information criterion (AIC) to rank the candidate models and a Gaussian mixture model (GMM) to uncover any intrinsic clustering of the data. Using a logistic regression, we find that plasma-beta, CME width, and the ratio O(sup 7) / O(sup 6) are the most significant predictor variables for the presence of an MC. Moreover, the propensity for an event to be identified as an MC decreases with heliocentric distance. These results tend to refute ideas ii) and iii). GMM clustering analysis further identifies three distinct groups of ICMEs; two of which match (at the 86% level) with events independently identified as MCs, and a third that matches with non-MCs (68 % overlap), Thus, idea v) is not supported. Choosing between ideas i) and
Exploratory and multivariate data analysis
Jambu, Michel
1991-01-01
With a useful index of notations at the beginning, this book explains and illustrates the theory and application of data analysis methods from univariate to multidimensional and how to learn and use them efficiently. This book is well illustrated and is a useful and well-documented review of the most important data analysis techniques.Key Features* Describes, in detail, exploratory data analysis techniques from the univariate to the multivariate ones* Features a complete description of correspondence analysis and factor analysis techniques as multidimensional statistical data a
Energy Technology Data Exchange (ETDEWEB)
Molinaroli, E.; Pistolato, M.; Rampazzo, G. [Dipartimento di Scienze Ambientali, Universita di Venezia, Dorsoduro 2137, 30123 Venezia (Italy); Guerzoni, S. [CNR, Istituto di Geologia Marina, via Gobetti 101, 40129 Bologna (Italy)
1999-06-01
The chemical characteristics of the mineral fractions of aerosol and precipitation collected in Sardinia (NW Mediterranean) are highlighted by means of two multivariate statistical approaches. Two different combinations of classification and statistical methods for geochemical data are presented. It is shown that the application of cluster analysis subsequent to Q-Factor analysis better distinguishes among Saharan dust, background pollution (Europe-Mediterranean) and local aerosol from various source regions (Sardinia). Conversely, the application of simple cluster analysis was able to distinguish only between aerosols and precipitation particles, without assigning the sources (local or distant) to the aerosol. This method also highlighted the fact that crust-enriched precipitation is similar to desert-derived aerosol. Major elements (Al, Na) and trace metal (Pb) turn out to be the most discriminating elements of the analysed data set. Independent use of mineralogical, granulometric and meteorological data confirmed the results derived from the statistical methods employed. (Copyright (c) 1999 Elsevier Science B.V., Amsterdam. All rights reserved.)
Robel, Martin; Kristo, Michael J
2008-11-01
The problem of identifying the provenance of unknown nuclear material in the environment by multivariate statistical analysis of its uranium and/or plutonium isotopic composition is considered. Such material can be introduced into the environment as a result of nuclear accidents, inadvertent processing losses, illegal dumping of waste, or deliberate trafficking in nuclear materials. Various combinations of reactor type and fuel composition were analyzed using Principal Components Analysis (PCA) and Partial Least Squares Discriminant Analysis (PLSDA) of the concentrations of nine U and Pu isotopes in fuel as a function of burnup. Real-world variation in the concentrations of (234)U and (236)U in the fresh (unirradiated) fuel was incorporated. The U and Pu were also analyzed separately, with results that suggest that, even after reprocessing or environmental fractionation, Pu isotopes can be used to determine both the source reactor type and the initial fuel composition with good discrimination.
Cardot, J-M; Roudier, B; Schütz, H
2017-07-01
The f 2 test is generally used for comparing dissolution profiles. In cases of high variability, the f 2 test is not applicable, and the Multivariate Statistical Distance (MSD) test is frequently proposed as an alternative by the FDA and EMA. The guidelines provide only general recommendations. MSD tests can be performed either on raw data with or without time as a variable or on parameters of models. In addition, data can be limited-as in the case of the f 2 test-to dissolutions of up to 85% or to all available data. In the context of the present paper, the recommended calculation included all raw dissolution data up to the first point greater than 85% as a variable-without the various times as parameters. The proposed MSD overcomes several drawbacks found in other methods.
Mujica Ascencio, Saul; Choe, ChunSik; Meinke, Martina C; Müller, Rainer H; Maksimov, George V; Wigger-Alberti, Walter; Lademann, Juergen; Darvin, Maxim E
2016-07-01
Propylene glycol is one of the known substances added in cosmetic formulations as a penetration enhancer. Recently, nanocrystals have been employed also to increase the skin penetration of active components. Caffeine is a component with many applications and its penetration into the epidermis is controversially discussed in the literature. In the present study, the penetration ability of two components - caffeine nanocrystals and propylene glycol, applied topically on porcine ear skin in the form of a gel, was investigated ex vivo using two confocal Raman microscopes operated at different excitation wavelengths (785nm and 633nm). Several depth profiles were acquired in the fingerprint region and different spectral ranges, i.e., 526-600cm(-1) and 810-880cm(-1) were chosen for independent analysis of caffeine and propylene glycol penetration into the skin, respectively. Multivariate statistical methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) combined with Student's t-test were employed to calculate the maximum penetration depths of each substance (caffeine and propylene glycol). The results show that propylene glycol penetrates significantly deeper than caffeine (20.7-22.0μm versus 12.3-13.0μm) without any penetration enhancement effect on caffeine. The results confirm that different substances, even if applied onto the skin as a mixture, can penetrate differently. The penetration depths of caffeine and propylene glycol obtained using two different confocal Raman microscopes are comparable showing that both types of microscopes are well suited for such investigations and that multivariate statistical PCA-LDA methods combined with Student's t-test are very useful for analyzing the penetration of different substances into the skin.
Matiatos, Ioannis
2016-01-15
Nitrate (NO3) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ(15)N-NO3 and δ(18)O-NO3) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO3 sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC).
Giménez-Forcada, Elena; Vega-Alegre, Marisol; Timón-Sánchez, Susana
2017-09-01
Naturally occurring arsenic in groundwater exceeding the limit for potability has been reported along the southern edge of the Cenozoic Duero Basin (CDB) near its contact with the Spanish Central System (SCS). In this area, spatial variability of arsenic is high, peaking at 241μg/L. Forty-seven percent of samples collected contained arsenic above the maximum allowable concentration for drinking water (10μg/L). Correlations of As with other hydrochemical variables were investigated using multivariate statistical analysis (Hierarchical Cluster Analysis, HCA and Principal Component Analysis, PCA). It was found that As, V, Cr and pH are closely related and that there were also close correlations with temperature and Na(+). The highest concentrations of arsenic and other associated Potentially Toxic Geogenic Trace Elements (PTGTE) are linked to alkaline NaHCO3 waters (pH≈9), moderate oxic conditions and temperatures of around 18°C-19°C. The most plausible hypothesis to explain the high arsenic concentrations is the contribution of deeper regional flows with a significant hydrothermal component (cold-hydrothermal waters), flowing through faults in the basement rock. Water mixing and water-rock interactions occur both in the fissured aquifer media (igneous and metasedimentary bedrock) and in the sedimentary environment of the CDB, where agricultural pollution phenomena are also active. A combination of multivariate statistical tools and hydrochemical analysis enabled the distribution pattern of dissolved As and other PTGTE in groundwaters in the study area to be interpreted, and their most likely origin to be established. This methodology could be applied to other sedimentary areas with similar characteristics and problems. Copyright © 2017 Elsevier B.V. All rights reserved.
Saraiva, Cristina; Oliveira, I; Silva, J A; Martins, C; Ventanas, J; García, C
2015-06-01
This study was performed in order to select volatile compounds to predict the off-odour and overall assessment of raw beef's freshness Maronesa breed, using multivariate analysis. M. longissimus dorsi packed in vacuum and MAP (70 % O2/20 % CO2/10 % N2) stored at 4 ºC were examined for off-odour perception as well as the overall assessment of freshness at 10 and 21 days post mortem. The results achieved in this study demonstrated that the selected volatile compounds could be considered as volatile indicators of beef spoilage, enclosing information for discrimination of Maronesa beef samples in sensory classes of odour corresponding to unspoiled and spoiled levels. Fifty-four volatile compounds were detected. A significant increase of aldehydes, ketones and alcohols were observed during storage in MAP. 2 and 3-methylbutanal, 2 and 3-methylbutanol, 1-pentanol, 1-hexanol, 2,3-octanedione, 3,5-octanedione, octanal and nonanal were suggested as indicators of beef spoilage. 3-methylpentane was considered as a marker in the first stages of spoilage in beef, decreasing during storage. Data were examined using PCR and PLSR models for different optimal subsets of volatile compounds. The simplicity and usefulness of the technique in using 0/1 data in preserving high levels of accuracy was also prevalent. The powerful analytical methodologies for reducing variables and the choice of optimal subsets could be advantageous in both basic research and the routine quality control of chilled beef.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H.; Fischl, Bruce
2016-01-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer’s and Huntington’s diseases1,2. The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as Diffusion Tensor Imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer’s disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same
Relationship between Multiple Regression and Selected Multivariable Methods.
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Meizel-Lambert, Cayli J; Schultz, John J; Sigman, Michael E
2015-11-01
Identification of osseous materials is generally established on gross anatomical features. However, highly fragmented or taphonomically altered materials may be problematic and may require chemical analysis. This research was designed to assess the use of scanning electron microscopy-energy-dispersive X-ray spectrometry (SEM/EDX), elemental analysis, and multivariate statistical analysis (principal component analysis) for discrimination of osseous and nonosseous materials of similar chemical composition. Sixty samples consisting of osseous (human and nonhuman bone and dental) and non-osseous samples were assessed. After outliers were removed a high overall correct classification of 97.97% was achieved, with 99.86% correct classification for osseous materials. In addition, a blind study was conducted using 20 samples to assess the applicability for using this method to classify unknown materials. All of the blind study samples were correctly classified resulting in 100% correct classification, further demonstrating the efficiency of SEM/EDX and statistical analysis for differentiation of osseous and nonosseous materials. © 2015 American Academy of Forensic Sciences.
Siepak, Marcin; Sojka, Mariusz
2017-08-01
The paper reports the results of measurements of trace elements concentrations in surface water samples collected at the lowland retention reservoirs of Stare Miasto and Kowalskie (Poland). The samples were collected once a month from October 2011 to November 2012. Al, As, Cd, Co, Cr, Cu, Li, Mn, Ni, Pb, Sb, V, and Zn were determined in water samples using the inductively coupled plasma with mass detection (ICP-QQQ). To assess the chemical composition of surface water, multivariate statistical methods of data analysis were used, viz. cluster analysis (CA), principal components analysis (PCA), and discriminant analysis (DA). They made it possible to observe similarities and differences in the chemical composition of water in the points of water samples collection, to uncover hidden factors accounting for the structure of the data, and to assess the impact of natural and anthropogenic sources on the content of trace elements in the water of retention reservoirs. The conducted statistical analyses made it possible to distinguish groups of trace elements allowing for the analysis of time and spatial variation of water in the studied reservoirs.
How well do test case prioritization techniques support statistical fault localization
Tse, TH; Jiang, B.; Zhang, Z; Chen, TY
2009-01-01
In continuous integration, a tight integration of test case prioritization techniques and fault-localization techniques may both expose failures faster and locate faults more effectively. Statistical fault-localization techniques use the execution information collected during testing to locate faults. Executing a small fraction of a prioritized test suite reduces the cost of testing, and yet the subsequent fault localization may suffer. This paper presents the first empirical study to examine...
Schmolke, S. R.; Broeg, K.; Zander, S.; Bissinger, V.; Hansen, P. D.; Kress, N.; Herut, B.; Jantzen, E.; Krüner, G.; Sturm, A.; Körting, W.; von Westernhagen, H.
A comprehensive database, containing biological and chemical information, collected in the framework of the bilateral interdisciplinary MARS project (''biological indicators of natural and man-made changes in marine and coastal waters'') during the years 1995-1997 in the coastal environment of the North Sea, was subjected to a multivariate statistical evaluation. The MARS project was designated to combine a variety of approaches and to develop a set of methods for the employment of biological indicators in pollution monitoring and environmental quality assessment. In total, nine ship cruises to four coastal sampling sites were conducted; 765 fish and 384 mussel samples were analysed for biological and chemical parameters. Additional information on the chemical background at the sampling sites was derived from sediment samples, collected at each of the four sampling sites. Based on the available chemical data in sediments and black mussel (Mytilus edulis) a pollution gradient between the selected sites, was established. The chemical body burden of flounder (Platichthys flesus) from these sites, though, did not reflect this gradient equally clear. In contrast, the biological information derived from measurements in fish samples displayed significant a regional as well as a temporal pattern. A multivariate bioindicator data matrix was evaluated employing a factor analysis model to identify relations between selected biological indicators, and to improve the understanding of a regional and temporal component in the parameter response. In a second approach, applying the k-means algorithm on the data matrix, two significantly different clusters of samples, characterised by the current health status of the fish, were extracted. Using this classification a temporal, and in the second order, a less pronounced spatial effect was evident. In particular, during July 1996, a clear sign of deteriorating environmental conditions was extracted from the biological data matrix.
Zhu, Guangxu; Guo, Qingjun; Xiao, Huayun; Chen, Tongbin; Yang, Jun
2017-06-01
Heavy metals are considered toxic to humans and ecosystems. In the present study, heavy metal concentration in soil was investigated using the single pollution index (PIi), the integrated Nemerow pollution index (PIN), and the geoaccumulation index (Igeo) to determine metal accumulation and its pollution status at the abandoned site of the Capital Iron and Steel Factory in Beijing and its surrounding area. Multivariate statistical (principal component analysis and correlation analysis), geostatistical analysis (ArcGIS tool), combined with stable Pb isotopic ratios, were applied to explore the characteristics of heavy metal pollution and the possible sources of pollutants. The results indicated that heavy metal elements show different degrees of accumulation in the study area, the observed trend of the enrichment factors, and the geoaccumulation index was Hg > Cd > Zn > Cr > Pb > Cu ≈ As > Ni. Hg, Cd, Zn, and Cr were the dominant elements that influenced soil quality in the study area. The Nemerow index method indicated that all of the heavy metals caused serious pollution except Ni. Multivariate statistical analysis indicated that Cd, Zn, Cu, and Pb show obvious correlation and have higher loads on the same principal component, suggesting that they had the same sources, which are related to industrial activities and vehicle emissions. The spatial distribution maps based on ordinary kriging showed that high concentrations of heavy metals were located in the local factory area and in the southeast-northwest part of the study region, corresponding with the predominant wind directions. Analyses of lead isotopes confirmed that Pb in the study soils is predominantly derived from three Pb sources: dust generated during steel production, coal combustion, and the natural background. Moreover, the ternary mixture model based on lead isotope analysis indicates that lead in the study soils originates mainly from anthropogenic sources, which contribute much more
Energy Technology Data Exchange (ETDEWEB)
Matiatos, Ioannis, E-mail: i.matiatos@iaea.org
2016-01-15
Nitrate (NO{sub 3}) is one of the most common contaminants in aquatic environments and groundwater. Nitrate concentrations and environmental isotope data (δ{sup 15}N–NO{sub 3} and δ{sup 18}O–NO{sub 3}) from groundwater of Asopos basin, which has different land-use types, i.e., a large number of industries (e.g., textile, metal processing, food, fertilizers, paint), urban and agricultural areas and livestock breeding facilities, were analyzed to identify the nitrate sources of water contamination and N-biogeochemical transformations. A Bayesian isotope mixing model (SIAR) and multivariate statistical analysis of hydrochemical data were used to estimate the proportional contribution of different NO{sub 3} sources and to identify the dominant factors controlling the nitrate content of the groundwater in the region. The comparison of SIAR and Principal Component Analysis showed that wastes originating from urban and industrial zones of the basin are mainly responsible for nitrate contamination of groundwater in these areas. Agricultural fertilizers and manure likely contribute to groundwater contamination away from urban fabric and industrial land-use areas. Soil contribution to nitrate contamination due to organic matter is higher in the south-western part of the area far from the industries and the urban settlements. The present study aims to highlight the use of environmental isotopes combined with multivariate statistical analysis in locating sources of nitrate contamination in groundwater leading to a more effective planning of environmental measures and remediation strategies in river basins and water bodies as defined by the European Water Frame Directive (Directive 2000/60/EC). - Highlights: • More enriched N-isotope values were observed in the industrial/urban areas. • A Bayesian isotope mixing model was applied in a multiple land-use area. • A 3-component model explained the factors controlling nitrate content in groundwater. • Industrial
Bliefernicht, Jan; Laux, Patrick; Siegmund, Jonatan; Kunstmann, Harald
2013-04-01
The development and application of statistical techniques with a special focus on a recalibration of meteorological or hydrological forecasts to eliminate the bias between forecasts and observations has received a great deal of attention in recent years. One reason is that retrospective forecasts are nowadays available which allows for a proper training and validation of this kind of techniques. The objective of this presentation is to propose several statistical techniques with different degree of complexity and to evaluate and compare their performance for a recalibration of seasonal ensemble forecasts of monthly precipitation. The techniques selected in this study range from straightforward normal score and quantile-quantile transformation, local scaling, to more sophisticated and novel statistical techniques such as Copula-based methodology recently proposed by Laux et al. (2011). The seasonal forecasts are derived from the Climate Forecast System Version 2. This version is the current coupled ocean-atmosphere general circulation model of the U.S. National Centers for Environmental Prediction used to provide forecasts up to nine months. The CFS precipitation forecasts are compared to monthly precipitation observations from the Global Precipitation Climatology Centre. The statistical techniques are tested for semi-arid regions in West Africa and the Indian subcontinent focusing on large-scale river basins such as the Ganges and the Volta basin. In both regions seasonal precipitation forecasts are a crucial source of information for the prediction of hydro-meteorological extremes, in particular for droughts. The evaluation is done using retrospective CFS ensemble forecast from 1982 to 2009. The training of the statistical techniques is done in a cross-validation mode. The outcome of this investigation illustrates large systematic differences between forecasts and observations, in particular for the Volta basin in West Africa. The selection of straightforward
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Friedel, Michael J.
2016-08-01
Numerical models provide a way to evaluate groundwater systems, but determining the hydrostratigraphic units (HSUs) used in constructing these models remains subjective, nonunique, and uncertain. A three-step machine-learning approach is proposed in which fusion, estimation, and clustering operations are performed on different data sets to arrive at HSUs at different scales. In step one, data fusion is performed by training a self-organizing map (SOM) with sparse borehole hydrogeologic (lithology, hydraulic conductivity, aqueous field parameters, dissolved constituents) and geophysical (gamma, spontaneous potential, and resistivity) measurements. Estimation is handled by iterative least-squares minimization of the SOM quantization and topographical errors. Application of the Davies-Bouldin criteria to k-means clustering of SOM nodes is used to determine the number and location of discontinuous borehole HSUs with low lateral density (based on borehole spacing at 100 s m) and high vertical density (based on cm-scale logging). In step two, a scaling network is trained using the estimated borehole HSUs, airborne electromagnetic measurements, and numerically inverted resistivity profiles. In step three, independent airborne electromagnetic measurements are applied to the scaling network, and the estimation performed to arrive at a set of continuous HSUs with high lateral density (based on sounding locations at meter (m) spacing) and medium vertical density (based on m-layer modeled structure). Performance metrics are used to evaluate each step of the approach. Efficacy of the proposed approach is demonstrated to map local-to-regional scale HSUs using hydrogeophysical data collected at a heterogeneous surficial aquifer in northwestern Nebraska, USA.
Friedel, Michael J.
2016-12-01
Numerical models provide a way to evaluate groundwater systems, but determining the hydrostratigraphic units (HSUs) used in constructing these models remains subjective, nonunique, and uncertain. A three-step machine-learning approach is proposed in which fusion, estimation, and clustering operations are performed on different data sets to arrive at HSUs at different scales. In step one, data fusion is performed by training a self-organizing map (SOM) with sparse borehole hydrogeologic (lithology, hydraulic conductivity, aqueous field parameters, dissolved constituents) and geophysical (gamma, spontaneous potential, and resistivity) measurements. Estimation is handled by iterative least-squares minimization of the SOM quantization and topographical errors. Application of the Davies-Bouldin criteria to k-means clustering of SOM nodes is used to determine the number and location of discontinuous borehole HSUs with low lateral density (based on borehole spacing at 100 s m) and high vertical density (based on cm-scale logging). In step two, a scaling network is trained using the estimated borehole HSUs, airborne electromagnetic measurements, and numerically inverted resistivity profiles. In step three, independent airborne electromagnetic measurements are applied to the scaling network, and the estimation performed to arrive at a set of continuous HSUs with high lateral density (based on sounding locations at meter (m) spacing) and medium vertical density (based on m-layer modeled structure). Performance metrics are used to evaluate each step of the approach. Efficacy of the proposed approach is demonstrated to map local-to-regional scale HSUs using hydrogeophysical data collected at a heterogeneous surficial aquifer in northwestern Nebraska, USA.
A survey of image processing techniques and statistics for ballistic specimens in forensic science.
Gerules, George; Bhatia, Sanjiv K; Jackson, Daniel E
2013-06-01
This paper provides a review of recent investigations on the image processing techniques used to match spent bullets and cartridge cases. It is also, to a lesser extent, a review of the statistical methods that are used to judge the uniqueness of fired bullets and spent cartridge cases. We review 2D and 3D imaging techniques as well as many of the algorithms used to match these images. We also provide a discussion of the strengths and weaknesses of these methods for both image matching and statistical uniqueness. The goal of this paper is to be a reference for investigators and scientists working in this field.
Basics, common errors and essentials of statistical tools and techniques in anesthesiology research.
Bajwa, Sukhminder Jit Singh
2015-01-01
The statistical portion is a vital component of any research study. The research methodology and the application of statistical tools and techniques have evolved over the years and have significantly helped the research activities throughout the globe. The results and inferences are not accurately possible without proper validation with various statistical tools and tests. The evidencebased anesthesia research and practice has to incorporate statistical tools in the methodology right from the planning stage of the study itself. Though the medical fraternity is well acquainted with the significance of statistics in research, there is a lack of in-depth knowledge about the various statistical concepts and principles among majority of the researchers. The clinical impact and consequences can be serious as the incorrect analysis, conclusions, and false results may construct an artificial platform on which future research activities are replicated. The present tutorial is an attempt to make anesthesiologists aware of the various aspects of statistical methods used in evidence-based research and also to highlight the common areas where maximum number of statistical errors are committed so as to adopt better statistical practices.
Basics, common errors and essentials of statistical tools and techniques in anesthesiology research
Bajwa, Sukhminder Jit Singh
2015-01-01
The statistical portion is a vital component of any research study. The research methodology and the application of statistical tools and techniques have evolved over the years and have significantly helped the research activities throughout the globe. The results and inferences are not accurately possible without proper validation with various statistical tools and tests. The evidencebased anesthesia research and practice has to incorporate statistical tools in the methodology right from the planning stage of the study itself. Though the medical fraternity is well acquainted with the significance of statistics in research, there is a lack of in-depth knowledge about the various statistical concepts and principles among majority of the researchers. The clinical impact and consequences can be serious as the incorrect analysis, conclusions, and false results may construct an artificial platform on which future research activities are replicated. The present tutorial is an attempt to make anesthesiologists aware of the various aspects of statistical methods used in evidence-based research and also to highlight the common areas where maximum number of statistical errors are committed so as to adopt better statistical practices. PMID:26702217
Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom
2016-04-01
Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants.
Directory of Open Access Journals (Sweden)
Vetrimurugan Elumalai
2017-04-01
Full Text Available Heavy metals in surface and groundwater were analysed and their sources were identified using multivariate statistical tools for two towns in South Africa. Human exposure risk through the drinking water pathway was also assessed. Electrical conductivity values showed that groundwater is desirable to permissible for drinking except for six locations. Concentration of aluminium, lead and nickel were above the permissible limit for drinking at all locations. Boron, cadmium, iron and manganese exceeded the limit at few locations. Heavy metal pollution index based on ten heavy metals indicated that 85% of the area had good quality water, but 15% was unsuitable. Human exposure dose through the drinking water pathway indicated no risk due to boron, nickel and zinc, moderate risk due to cadmium and lithium and high risk due to silver, copper, manganese and lead. Hazard quotients were high in all sampling locations for humans of all age groups, indicating that groundwater is unsuitable for drinking purposes. Highly polluted areas were located near the coast, close to industrial operations and at a landfill site representing human-induced pollution. Factor analysis identified the four major pollution sources as: (1 industries; (2 mining and related activities; (3 mixed sources- geogenic and anthropogenic and (4 fertilizer application.
Institute of Scientific and Technical Information of China (English)
Rong Ma; Jiansheng Shi; Jichao Liu; Chunlei Gui
2014-01-01
Understanding the controlling factor of groundwater quality can enhance promoting sustaina-ble development of groundwater resources. To this end, multivariate statistical analysis (MA) and hydrochemical analysis were introduced in this work. The results indicate that the canonical discriminant function with 7 parameters was established using the discriminant analysis (DA) method, which can afford 100%correct assignation according to the 3 different clusters (good water (GW), poor water (PW), and very poor water (VPW)) obtained from cluster analysis (CA). According to factor analysis (FA), 8 factors were ex-tracted from 25 hydrochemical elements and account for 80.897%of the total data variance, suggesting that groundwater with higher concentrations of sodium, calcium, magnesium, chloride, and sulfate in southeastern study area are mainly affected by the natural process;the higher level of arsenic and chromium in ground-water extracted from northwestern part of study area are derived by industrial activities;domestic and agri-culture sewage have important contribution to copper, iron, iodine, and phosphate in the northern study area. Therefore, this work can help identify the main controlling factor of groundwater quality in North China plain so as to make better and more informed decisions about how to achieve groundwater resources sustain-able development.
Wu, Wei; Sun, Le; Zhang, Zhe; Guo, Yingying; Liu, Shuying
2015-03-25
An ultra-high-performance liquid chromatography coupled with quadrupole-time-of-flight mass spectrometry (UHPLC-Q-TOF-MS) method was developed for the detection and structural analysis of ginsenosides in white ginseng and related processed products (red ginseng). Original neutral, malonyl, and chemically transformed ginsenosides were identified in white and red ginseng samples. The aglycone types of ginsenosides were determined by MS/MS as PPD (m/z 459), PPT (m/z 475), C-24, -25 hydrated-PPD or PPT (m/z 477 or m/z 493), and Δ20(21)-or Δ20(22)-dehydrated-PPD or PPT (m/z 441 or m/z 457). Following the structural determination, the UHPLC-Q-TOF-MS-based chemical profiling coupled with multivariate statistical analysis method was applied for global analysis of white and processed ginseng samples. The chemical markers present between the processed products red ginseng and white ginseng could be assigned. Process-mediated chemical changes were recognized as the hydrolysis of ginsenosides with large molecular weight, chemical transformations of ginsenosides, changes in malonyl-ginsenosides, and generation of 20-(R)-ginsenoside enantiomers. The relative contents of compounds classified as PPD, PPT, malonyl, and transformed ginsenosides were calculated based on peak areas in ginseng before and after processing. This study provides possibility to monitor multiple components for the quality control and global evaluation of ginseng products during processing. Copyright © 2014 Elsevier B.V. All rights reserved.
Gogna, Navdeep; Hamid, Neda; Dorai, Kavita
2015-11-10
Extracts from the Carica papaya L. plant are widely reported to contain metabolites with antibacterial, antioxidant and anticancer activity. This study aims to analyze the metabolic profiles of papaya leaves and seeds in order to gain insights into their phytomedicinal constituents. We performed metabolite fingerprinting using 1D and 2D 1H NMR experiments and used multivariate statistical analysis to identify those plant parts that contain the most concentrations of metabolites of phytomedicinal value. Secondary metabolites such as phenyl propanoids, including flavonoids, were found in greater concentrations in the leaves as compared to the seeds. UPLC-ESI-MS verified the presence of significant metabolites in the papaya extracts suggested by the NMR analysis. Interestingly, the concentration of eleven secondary metabolites namely caffeic, cinnamic, chlorogenic, quinic, coumaric, vanillic, and protocatechuic acids, naringenin, hesperidin, rutin, and kaempferol, were higher in young as compared to old papaya leaves. The results of the NMR analysis were corroborated by estimating the total phenolic and flavonoid content of the extracts. Estimation of antioxidant activity in leaves and seed extracts by DPPH and ABTS in-vitro assays and antioxidant capacity in C2C12 cell line also showed that papaya extracts exhibit high antioxidant activity.
Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg
2015-03-01
Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.
Xiao, Li
Despite the great passion and endless efforts on development of renewable energy from biomass, the commercialization and scale up of biofuel production is still under pressure and facing challenges. New ideas and facilities are being tested around the world targeting at reducing cost and improving product value. Cutting edge technologies involving analytical chemistry, statistics analysis, industrial engineering, computer simulation, and mathematics modeling, etc. keep integrating modern elements into this classic research. One of those challenges of commercializing biofuel production is the complexity from chemical composition of biomass feedstock and the products. Because of this, feedstock selection and process optimization cannot be conducted efficiently. This dissertation attempts to further evaluate biomass thermal decomposition process using both traditional methods and advanced technique (Pyrolysis Molecular Beam Mass Spectrometry). Focus has been made on data base generation of thermal decomposition products from biomass at different temperatures, finding out the relationship between traditional methods and advanced techniques, evaluating process efficiency and optimizing reaction conditions, comparison of typically utilized biomass feedstock and new search on innovative species for economical viable feedstock preparation concepts, etc. Lab scale quartz tube reactors and 80il stainless steel sample cups coupled with auto-sampling system were utilized to simulate the complicated reactions happened in real fluidized or entrained flow reactors. Two main high throughput analytical techniques used are Near Infrared Spectroscopy (NIR) and Pyrolysis Molecular Beam Mass Spectrometry (Py-MBMS). Mass balance, carbon balance, and product distribution are presented in detail. Variations of thermal decomposition temperature range from 200°C to 950°C. Feedstocks used in the study involve typical hardwood and softwood (red oak, white oak, yellow poplar, loblolly pine
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Goodpaster, John V; Sturdevant, Amanda B; Andrews, Kristen L; Brun-Conti, Leanora
2007-05-01
Comparisons of polyvinyl chloride electrical tape typically rely upon evaluating class characteristics such as physical dimensions, surface texture, and chemical composition. Given the various techniques that are available for this purpose, a comprehensive study has been undertaken to establish an optimal analytical scheme for electrical tape comparisons. Of equal importance is the development of a quantitative means for sample discrimination. In this study, 67 rolls of black electrical tape representing 34 different nominal brands were analyzed via scanning electron microscopy and energy dispersive spectroscopy. Differences in surface roughness, calendering marks, and filler particle size were readily apparent, including between some rolls of the same nominal brand. The relative amounts of magnesium, aluminum, silicon, sulfur, lead, chlorine, antimony, calcium, titanium, and zinc varied greatly between brands and, in some cases, could be linked to the year of manufacture. For the first time, quantitative differentiation of electrical tapes was achieved through multivariate statistical techniques, with 36 classes identified within the sample population. A single-blind study was also completed where questioned tape samples were correctly associated with known exemplars. Finally, two case studies are presented where tape recovered from an improvised explosive device is compared with tape recovered from a suspect.
A STATISTICAL CORRELATION TECHNIQUE AND A NEURAL-NETWORK FOR THE MOTION CORRESPONDENCE PROBLEM
VANDEEMTER, JH; MASTEBROEK, HAK
1994-01-01
A statistical correlation technique (SCT) and two variants of a neural network are presented to solve the motion correspondence problem. Solutions of the motion correspondence problem aim to maintain the identities of individuated elements as they move. In a preprocessing stage, two snapshots of a m
Computer program uses Monte Carlo techniques for statistical system performance analysis
Wohl, D. P.
1967-01-01
Computer program with Monte Carlo sampling techniques determines the effect of a component part of a unit upon the overall system performance. It utilizes the full statistics of the disturbances and misalignments of each component to provide unbiased results through simulated random sampling.
Statistical Techniques Used in Published Articles: A Historical Review of Reviews
Skidmore, Susan Troncoso; Thompson, Bruce
2010-01-01
The purpose of the present study is to provide a historical account and metasynthesis of which statistical techniques are most frequently used in the fields of education and psychology. Six articles reviewing the "American Educational Research Journal" from 1969 to 1997 and five articles reviewing the psychological literature from 1948 to 2001…
Nuclear Technology. Course 26: Metrology. Module 27-7, Statistical Techniques in Metrology.
Espy, John; Selleck, Ben
This seventh in a series of eight modules for a course titled Metrology focuses on descriptive and inferential statistical techniques in metrology. The module follows a typical format that includes the following sections: (1) introduction, (2) module prerequisites, (3) objectives, (4) notes to instructor/student, (5) subject matter, (6) materials…
Novick, Melvin R.
This project is concerned with the development and implementation of some new statistical techniques that will facilitate a continuing input of information about the student to the instructional manager so that individualization of instruction can be managed effectively. The source of this informational input is typically a short…
Li, Yan; Zhang, Ji; Jin, Hang; Liu, Honggao; Wang, Yuanzhong
2016-08-05
A quality assessment system comprised of a tandem technique of ultraviolet (UV) spectroscopy and ultra-fast liquid chromatography (UFLC) aided by multivariate analysis was presented for the determination of geographic origin of Wolfiporia extensa collected from five regions in Yunnan Province of China. Characteristic UV spectroscopic fingerprints of samples were determined based on its methanol extract. UFLC was applied for the determination of pachymic acid (a biomarker) presented in individual test samples. The spectrum data matrix and the content of pachymic acid were integrated and analyzed by partial least squares discriminant analysis (PLS-DA) and hierarchical cluster analysis (HCA). The results showed that chemical properties of samples were clearly dominated by the epidermis and inner part as well as geographical origins. The relationships among samples obtained from these five regions have been also presented. Moreover, an interesting finding implied that geographical origins had much greater influence on the chemical properties of epidermis compared with that of the inner part. This study demonstrated that a rapid tool for accurate discrimination of W. extensa by UV spectroscopy and UFLC could be available for quality control of complicated medicinal mushrooms. Copyright © 2016 Elsevier B.V. All rights reserved.
Li, Yan; Zhang, Ji; Jin, Hang; Liu, Honggao; Wang, Yuanzhong
2016-08-01
A quality assessment system comprised of a tandem technique of ultraviolet (UV) spectroscopy and ultra-fast liquid chromatography (UFLC) aided by multivariate analysis was presented for the determination of geographic origin of Wolfiporia extensa collected from five regions in Yunnan Province of China. Characteristic UV spectroscopic fingerprints of samples were determined based on its methanol extract. UFLC was applied for the determination of pachymic acid (a biomarker) presented in individual test samples. The spectrum data matrix and the content of pachymic acid were integrated and analyzed by partial least squares discriminant analysis (PLS-DA) and hierarchical cluster analysis (HCA). The results showed that chemical properties of samples were clearly dominated by the epidermis and inner part as well as geographical origins. The relationships among samples obtained from these five regions have been also presented. Moreover, an interesting finding implied that geographical origins had much greater influence on the chemical properties of epidermis compared with that of the inner part. This study demonstrated that a rapid tool for accurate discrimination of W. extensa by UV spectroscopy and UFLC could be available for quality control of complicated medicinal mushrooms.
Analysis on Maize Agronomic traits by Multivariate Statistical Method%应用多元统计分析玉米农艺性状
Institute of Scientific and Technical Information of China (English)
谭贤杰; 覃兰秋; 廖金秀; 周锦国; 江禹奉; 谢和霞; 程伟东; 吴子恺
2011-01-01
在作物遗传育种研究中,产量及其相关性状多属于数量性状范畴,这类性状由多基因控制,易受环境影响使其遗传极为复杂,并且性状间常存在复杂相关关系.产量及其相关性状间的复杂关系使得育种中对以产量为目标的选择极为困难.多元统计分析是研究客观事物中多个变量之间相互依赖的统计规律性综合分析方法.合理利用多元统计分析可以加深对性状间相互关系的遗传规律及各相关性状对产量影响的主次和依存关系认识,为新品种选育和改良提供理论依据.对35个玉米品种(组合)的20个农艺性状应用GGE双标图、因子分析和聚类分析研究,结果表明,平均日产量、千粒重、穗长与产量呈显著正相关;20个农艺性状可综合为6个公因子;以6个公因子为综合指标对35个品种(组合)聚类结果聚成17个类群,其中G8、G14、G12、G10和G17为综合性状优良品种(组合).%In crop breeding, yield and yield related traits are mostly quantitative traits. These traits are controlled by multiple genes and apt to affect by environment, the relationship between traits arecomplex. The complex relationship between yield and yield related traits are hindrances to crops breeding for yield as target trait. Multivariate statistical analysis is a comprehensive and powerful tool for multivariate statistical analysis, it has been comprehensively applied in genetic breeding for discovering the discipline and the major and minor relationship of trait heredity. In this study,20 agronomic traits of 35 maize varieties (combination) were analyzed by GGE biplot, and carried out factor analysis and cluster analysis. The results showed that: (1) yield per day, kilo grain weight and ear length were significantly correlated with yield. (2)20 agronomic traits could be consolidated into six factors. (3)35 varieties clustered into 17 groups scoring by 6 factors. (4) variety (combination) G 8, G 14, G 12, G
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Understanding the process of the changing phytoplankton patterns can be particularly useful in water quality improvement and management decisions. However, it is generally not easy to illustrate the interactions between phytoplankton biomass and related environmental variables given their high spatial and temporal heterogeneity. To elucidate relationships between them in a eutrophic shallow lake, Taihu Lake, relative long-term data set of biotic and abiotic parameters of water quality in the lake were conducted using multivariate statistical analysis within seasonal periodicity. The results indicate that water temperature and total phosphorus (TP) played governing roles in phytoplankton dynamics in most seasons (i.e. temperature in winter, spring and summer; TP in spring, summer and autumn); COD (chemical oxygen demand) and BOD (biological oxygen demand) presented significant positive relationships with phytoplankton biomass in spring, summer and autumn. However, a complex interplay was found between phytoplankton biomass and nitrogen considering significant positive relationships occurring between them in spring and autumn, and conversely negative ones in summer. As the predatory factor, zooplankton presented significant grazing-pressure on phytoplankton biomass during summer in view of negative relationship between them in the season. Significant feedback effects of phytoplankton development were identified in summer and autumn in view that significant relationships were observed between phytoplankton biomass and pH, Trans (transparency of water) and DO. The results indicate that interactions between phytoplankton biomass and related environmental variables are highly sensitive to seasonal periodicity, which improves understanding of different roles of biotic and abiotic variables upon phytoplankton variability, and hence, advances management methods for eutrophic lakes.
Lodola, Alessio; Sirirak, Jitnapa; Fey, Natalie; Rivara, Silvia; Mor, Marco; Mulholland, Adrian J
2010-09-14
The effects of structural fluctuations, due to protein dynamics, on enzyme activity are at the heart of current debates on enzyme catalysis. There is evidence that fatty acid amide hydrolase (FAAH) is an enzyme for which reaction proceeds via a high-energy, reactive conformation, distinct from the predominant enzyme-substrate complex (Lodola et al. Biophys. J. 2007, 92, L20-22). Identifying the structural causes of differences in reactivity between conformations in such complex systems is not trivial. Here, we show that multivariate analysis of key structural parameters can identify structural determinants of barrier height by analysis of multiple reaction paths. We apply a well-tested quantum mechanics/molecular mechanics (QM/MM) method to the first step of the acylation reaction between FAAH and oleamide substrate for 36 different starting structures. Geometrical parameters (consisting of the key bond distances that change during the reaction) were collected and used for principal component analysis (PCA), partial least-squares (PLS) regression analysis, and multiple linear regression (MLR) analysis. PCA indicates that different "families" of enzyme-substrate conformations arise from QM/MM molecular dynamics simulation and that rarely sampled, catalytically significant conformational states can be identified. PLS and MLR analyses allowed the construction of linear regression models, correlating the calculated activation barriers with simple geometrical descriptors. These analyses reveal the presence of two fully independent geometrical effects, explaining 78% of the variation in the activation barrier, which are directly correlated with transition-state stabilization (playing a major role in catalysis) and substrate binding. These results highlight the power of statistical approaches of this type in identifying crucial structural features that contribute to enzyme reactivity.
Wang, Jie; Liu, Guijian; Liu, Houqi; Lam, Paul K S
2017-04-01
A total of 211 water samples were collected from 53 key sampling points from 5-10th July 2013 at four different depths (0m, 2m, 4m, 8m) and at different sites in the Huaihe River, Anhui, China. These points monitored for 18 parameters (water temperature, pH, TN, TP, TOC, Cu, Pb, Zn, Ni, Co, Cr, Cd, Mn, B, Fe, Al, Mg, and Ba). The spatial variability, contamination sources and health risk of trace elements as well as the river water quality were investigated. Our results were compared with national (CSEPA) and international (WHO, USEPA) drinking water guidelines, revealing that Zn, Cd and Pb were the dominant pollutants in the water body. Application of different multivariate statistical approaches, including correlation matrix and factor/principal component analysis (FA/PCA), to assess the origins of the elements in the Huaihe River, identified three source types that accounted for 79.31% of the total variance. Anthropogenic activities were considered to contribute much of the Zn, Cd, Pb, Ni, Co, and Mn via industrial waste, coal combustion, and vehicle exhaust; Ba, B, Cr and Cu were controlled by mixed anthropogenic and natural sources, and Mg, Fe and Al had natural origins from weathered rocks and crustal materials. Cluster analysis (CA) was used to classify the 53 sample points into three groups of water pollution, high pollution, moderate pollution, and low pollution, reflecting influences from tributaries, power plants and vehicle exhaust, and agricultural activities, respectively. The results of the water quality index (WQI) indicate that water in the Huaihe River is heavily polluted by trace elements, so approximately 96% of the water in the Huaihe River is unsuitable for drinking. A health risk assessment using the hazard quotient and index (HQ/HI) recommended by the USEPA suggests that Co, Cd and Pb in the river could cause non-carcinogenic harm to human health.
Geert Heidema, A.; Thissen, U.; Boer, J.M.A.; Bouwman, F.G.; Feskens, E.J.M.; Mariman, E.C.M.
2009-01-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch
STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE
Institute of Scientific and Technical Information of China (English)
梅长林; 张文修; 梁怡
2001-01-01
Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.
Energy Technology Data Exchange (ETDEWEB)
Ren, Qingguo, E-mail: renqg83@163.com [Department of Radiology, Hua Dong Hospital of Fudan University, Shanghai 200040 (China); Dewan, Sheilesh Kumar, E-mail: sheilesh_d1@hotmail.com [Department of Geriatrics, Hua Dong Hospital of Fudan University, Shanghai 200040 (China); Li, Ming, E-mail: minli77@163.com [Department of Radiology, Hua Dong Hospital of Fudan University, Shanghai 200040 (China); Li, Jianying, E-mail: Jianying.Li@med.ge.com [CT Imaging Research Center, GE Healthcare China, Beijing (China); Mao, Dingbiao, E-mail: maodingbiao74@163.com [Department of Radiology, Hua Dong Hospital of Fudan University, Shanghai 200040 (China); Wang, Zhenglei, E-mail: Williswang_doc@yahoo.com.cn [Department of Radiology, Shanghai Electricity Hospital, Shanghai 200050 (China); Hua, Yanqing, E-mail: cjr.huayanqing@vip.163.com [Department of Radiology, Hua Dong Hospital of Fudan University, Shanghai 200040 (China)
2012-10-15
Purpose: To compare image quality and visualization of normal structures and lesions in brain computed tomography (CT) with adaptive statistical iterative reconstruction (ASIR) and filtered back projection (FBP) reconstruction techniques in different X-ray tube current–time products. Materials and methods: In this IRB-approved prospective study, forty patients (nineteen men, twenty-one women; mean age 69.5 ± 11.2 years) received brain scan at different tube current–time products (300 and 200 mAs) in 64-section multi-detector CT (GE, Discovery CT750 HD). Images were reconstructed with FBP and four levels of ASIR-FBP blending. Two radiologists (please note that our hospital is renowned for its geriatric medicine department, and these two radiologists are more experienced in chronic cerebral vascular disease than in neoplastic disease, so this research did not contain cerebral tumors but as a discussion) assessed all the reconstructed images for visibility of normal structures, lesion conspicuity, image contrast and diagnostic confidence in a blinded and randomized manner. Volume CT dose index (CTDI{sub vol}) and dose-length product (DLP) were recorded. All the data were analyzed by using SPSS 13.0 statistical analysis software. Results: There was no statistically significant difference between the image qualities at 200 mAs with 50% ASIR blending technique and 300 mAs with FBP technique (p > .05). While between the image qualities at 200 mAs with FBP and 300 mAs with FBP technique a statistically significant difference (p < .05) was found. Conclusion: ASIR provided same image quality and diagnostic ability in brain imaging with greater than 30% dose reduction compared with FBP reconstruction technique.
Flotation control -- A multivariable stabilizer
Energy Technology Data Exchange (ETDEWEB)
Schubert, J.H.; Henning, R.G.D.; Hulbert, D.G.; Craig, I.K. [Mintek, Randburg (South Africa)
1995-12-31
This paper presents a stabilizing controller for flotation plants which uses a quasi-multivariable technique. The controller monitors all the levels in the plant, and by anticipating interactions between various parts of the plant, is able to stabilize the plant far more successfully than the normal plant control. Once stabilizing control has been achieved, optimization of the process becomes easier and more sustainable. An estimate of the improvement in metallurgical performance is made and a singular value analysis was conducted to verify that the multivariable algorithm will theoretically control better than a collection of individual PID loops. Metallurgical results are presented to show that the improvements are attainable in practice. Control by the Mintek algorithm was alternated with normal plant control, to show that the improvements are statistically significant.
Xiao, Li; Wei, Hui; Himmel, Michael E; Jameel, Hasan; Kelley, Stephen S
2014-01-01
Optimizing the use of lignocellulosic biomass as the feedstock for renewable energy production is currently being developed globally. Biomass is a complex mixture of cellulose, hemicelluloses, lignins, extractives, and proteins; as well as inorganic salts. Cell wall compositional analysis for biomass characterization is laborious and time consuming. In order to characterize biomass fast and efficiently, several high through-put technologies have been successfully developed. Among them, near infrared spectroscopy (NIR) and pyrolysis-molecular beam mass spectrometry (Py-mbms) are complementary tools and capable of evaluating a large number of raw or modified biomass in a short period of time. NIR shows vibrations associated with specific chemical structures whereas Py-mbms depicts the full range of fragments from the decomposition of biomass. Both NIR vibrations and Py-mbms peaks are assigned to possible chemical functional groups and molecular structures. They provide complementary information of chemical insight of biomaterials. However, it is challenging to interpret the informative results because of the large amount of overlapping bands or decomposition fragments contained in the spectra. In order to improve the efficiency of data analysis, multivariate analysis tools have been adapted to define the significant correlations among data variables, so that the large number of bands/peaks could be replaced by a small number of reconstructed variables representing original variation. Reconstructed data variables are used for sample comparison (principal component analysis) and for building regression models (partial least square regression) between biomass chemical structures and properties of interests. In this review, the important biomass chemical structures measured by NIR and Py-mbms are summarized. The advantages and disadvantages of conventional data analysis methods and multivariate data analysis methods are introduced, compared and evaluated. This review
Directory of Open Access Journals (Sweden)
Li eXiao
2014-08-01
Full Text Available Optimizing the use of lignocellulosic biomass as the feedstock for renewable energy production is currently being developed globally. Biomass is a complex mixture of cellulose, hemicelluloses, lignins, extractives, and proteins; as well as inorganic salts. Cell wall compositional analysis for biomass characterization is laborious and time consuming. In order to characterize biomass fast and efficiently, several high through-put technologies have been successfully developed. Among them, near infrared spectroscopy (NIR and pyrolysis-molecular beam mass spectrometry (Py-mbms are complementary tools and capable of evaluating a large number of raw or modified biomass in a short period of time. NIR shows vibrations associated with specific chemical structures whereas Py-mbms depicts the full range of fragments from the decomposition of biomass. Both NIR vibrations and Py-mbms peaks are assigned to possible chemical functional groups and molecular structures. They provide complementary information of chemical insight of biomaterials. However, it is challenging to interpret the informative results because of the large amount of overlapping bands or decomposition fragments contained in the spectra. In order to improve the efficiency of data analysis, multivariate analysis tools have been adapted to define the significant correlations among data variables, so that the large number of bands/peaks could be replaced by a small number of reconstructed variables representing original variation. Reconstructed data variables are used for sample comparison (principal component analysis and for building regression models (partial least square regression between biomass chemical structures and properties of interests. In this review, the important biomass chemical structures measured by NIR and Py-mbms are summarized. The advantages and disadvantages of conventional data analysis methods and multivariate data analysis methods are introduced, compared and evaluated
Essentials of multivariate data analysis
Spencer, Neil H
2013-01-01
""… this text provides an overview at an introductory level of several methods in multivariate data analysis. It contains in-depth examples from one data set woven throughout the text, and a free [Excel] Add-In to perform the analyses in Excel, with step-by-step instructions provided for each technique. … could be used as a text (possibly supplemental) for courses in other fields where researchers wish to apply these methods without delving too deeply into the underlying statistics.""-The American Statistician, February 2015
Serdobolskii, Vadim Ivanovich
2007-01-01
This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Shaikh, Muhammad Mujtaba; Memon, Abdul Jabbar; Hussain, Manzoor
2016-09-01
In this article, we describe details of the data used in the research paper "Confidence bounds for energy conservation in electric motors: An economical solution using statistical techniques" [1]. The data presented in this paper is intended to show benefits of high efficiency electric motors over the standard efficiency motors of similar rating in the industrial sector of Pakistan. We explain how the data was collected and then processed by means of formulas to show cost effectiveness of energy efficient motors in terms of three important parameters: annual energy saving, cost saving and payback periods. This data can be further used to construct confidence bounds for the parameters using statistical techniques as described in [1].
Design of U-Geometry Parameters Using Statistical Analysis Techniques in the U-Bending Process
Wiriyakorn Phanitwong; Untika Boochakul; Sutasn Thipprakmas
2017-01-01
The various U-geometry parameters in the U-bending process result in processing difficulties in the control of the spring-back characteristic. In this study, the effects of U-geometry parameters, including channel width, bend angle, material thickness, tool radius, as well as workpiece length, and their design, were investigated using a combination of finite element method (FEM) simulation, and statistical analysis techniques. Based on stress distribution analyses, the FEM simulation results ...
The statistical analysis techniques to support the NGNP fuel performance experiments
Pham, Binh T.; Einerson, Jeffrey J.
2013-10-01
This paper describes the development and application of statistical analysis techniques to support the Advanced Gas Reactor (AGR) experimental program on Next Generation Nuclear Plant (NGNP) fuel performance. The experiments conducted in the Idaho National Laboratory's Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the NGNP Data Management and Analysis System for automated processing and qualification of the AGR measured data. The neutronic and thermal code simulation results are used for comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the fuel temperature within a given range.
Ratner, Bruce
2011-01-01
The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has
The Nature and Role of Statistics in the Business School Curriculum.
Parker, R. Stephen; Pettijohn, Charles E.; Keillor, Bruce D.
1999-01-01
According to survey responses form 114 accredited business schools, 83 offer two undergraduate statistics courses, and 79 offer a graduate statistics course. Commonly taught undergraduate topics are descriptive statistics, probability distribution, hypothesis, testing, and multivariate techniques. (SK)
Caneca, Arnobio Roberto; Pimentel, M Fernanda; Galvão, Roberto Kawakami Harrop; da Matta, Cláudia Eliane; de Carvalho, Florival Rodrigues; Raimundo, Ivo M; Pasquini, Celio; Rohwedder, Jarbas J R
2006-09-15
This paper presents two methodologies for monitoring the service condition of diesel-engine lubricating oils on the basis of infrared spectra. In the first approach, oils samples are discriminated into three groups, each one associated to a given wear stage. An algorithm is proposed to select spectral variables with good discriminant power and small collinearity for the purpose of discriminant analysis classification. As a result, a classification accuracy of 93% was obtained both in the middle (MIR) and near-infrared (NIR) ranges. The second approach employs multivariate calibration methods to predict the viscosity of the lubricant. In this case, the use of absorbance measurements in the NIR spectral range was not successful, because of experimental difficulties associated to the presence of particulate matter. Such a problem was circumvented by the use of attenuated total reflectance (ATR) measurements in the MIR spectral range, in which an RMSEP of 3.8cSt and a relative average error of 3.2% were attained.
Directory of Open Access Journals (Sweden)
M. V. Ninu Krishnan
2015-01-01
Full Text Available In the present study, the Information Value (InfoVal and the Multiple Logistic Regression (MLR methods based on bivariate and multivariate statistical analysis have been applied for shallow landslide initiation susceptibility assessment in a selected subwatershed in the Western Ghats, Kerala, India, to determine the suitability of geographical information systems (GIS assisted statistical landslide susceptibility assessment methods in the data constrained regions. The different landslide conditioning terrain variables considered in the analysis are geomorphology, land use/land cover, soil thickness, slope, aspect, relative relief, plan curvature, profile curvature, drainage density, the distance from drainages, lineament density and distance from lineaments. Landslide Susceptibility Index (LSI maps were produced by integrating the weighted themes and divided into five landslide susceptibility zones (LSZ by correlating the LSI with general terrain conditions. The predictive performances of the models were evaluated through success and prediction rate curves. The area under success rate curves (AUC for InfoVal and MLR generated susceptibility maps shows 84.11% and 68.65%, respectively. The prediction rate curves show good to moderate correlation between the distribution of the validation group of landslides and LSZ maps with AUC values of 0.648 and 0.826 respectively for MLR and InfoVal produced LSZ maps. Considering the best fit and suitability of the models in the study area by quantitative prediction accuracy, LSZ map produced by the InfoVal technique shows higher accuracy, i.e. 82.60%, than the MLR model and is more realistic while compared in the field and is considered as the best suited model for the assessment of landslide susceptibility in areas similar to the study area. The LSZ map produced for the area can be utilised for regional planning and assessment process, by incorporating the generalised rainfall conditions in the area. DOI
Directory of Open Access Journals (Sweden)
VIMALA C.
2015-05-01
Full Text Available In recent years, speech technology has become a vital part of our daily lives. Various techniques have been proposed for developing Automatic Speech Recognition (ASR system and have achieved great success in many applications. Among them, Template Matching techniques like Dynamic Time Warping (DTW, Statistical Pattern Matching techniques such as Hidden Markov Model (HMM and Gaussian Mixture Models (GMM, Machine Learning techniques such as Neural Networks (NN, Support Vector Machine (SVM, and Decision Trees (DT are most popular. The main objective of this paper is to design and develop a speaker-independent isolated speech recognition system for Tamil language using the above speech recognition techniques. The background of ASR system, the steps involved in ASR, merits and demerits of the conventional and machine learning algorithms and the observations made based on the experiments are presented in this paper. For the above developed system, highest word recognition accuracy is achieved with HMM technique. It offered 100% accuracy during training process and 97.92% for testing process.
Borràs, Eva; Ferré, Joan; Boqué, Ricard; Mestres, Montserrat; Aceña, Laura; Calvo, Angels; Busto, Olga
2016-07-15
Three instrumental techniques, headspace-mass spectrometry (HS-MS), mid-infrared spectroscopy (MIR) and UV-visible spectrophotometry (UV-vis), have been combined to classify virgin olive oil samples based on the presence or absence of sensory defects. The reference sensory values were provided by an official taste panel. Different data fusion strategies were studied to improve the discrimination capability compared to using each instrumental technique individually. A general model was applied to discriminate high-quality non-defective olive oils (extra-virgin) and the lowest-quality olive oils considered non-edible (lampante). A specific identification of key off-flavours, such as musty, winey, fusty and rancid, was also studied. The data fusion of the three techniques improved the classification results in most of the cases. Low-level data fusion was the best strategy to discriminate musty, winey and fusty defects, using HS-MS, MIR and UV-vis, and the rancid defect using only HS-MS and MIR. The mid-level data fusion approach using partial least squares-discriminant analysis (PLS-DA) scores was found to be the best strategy for defective vs non-defective and edible vs non-edible oil discrimination. However, the data fusion did not sufficiently improve the results obtained by a single technique (HS-MS) to classify non-defective classes. These results indicate that instrumental data fusion can be useful for the identification of sensory defects in virgin olive oils.
Mispelaar, V.G. van; Smilde, A.K.; Noord, O.E. de; Blomberg, J.; Schoenmakers, P.J.
2005-01-01
Comprehensive two-dimensional gas chromatography (GC × GC) has proven to be an extremely powerful separation technique for the analysis of complex volatile mixtures. This separation power can be used to discriminate between highly similar samples. In this article we will describe the use of GC × GC
Kassomenos, P.; Vardoulakis, S.; Borge, R.; Lumbreras, J.; Papaloukas, C.; Karakitsios, S.
2010-10-01
In this study, we used and compared three different statistical clustering methods: an hierarchical, a non-hierarchical (K-means) and an artificial neural network technique (self-organizing maps (SOM)). These classification methods were applied to a 4-year dataset of 5 days kinematic back trajectories of air masses arriving in Athens, Greece at 12.00 UTC, in three different heights, above the ground. The atmospheric back trajectories were simulated with the HYSPLIT Vesion 4.7 model of National Oceanic and Atmospheric Administration (NOAA). The meteorological data used for the computation of trajectories were obtained from NOAA reanalysis database. A comparison of the three statistical clustering methods through statistical indices was attempted. It was found that all three statistical methods seem to depend to the arrival height of the trajectories, but the degree of dependence differs substantially. Hierarchical clustering showed the highest level of dependence for fast-moving trajectories to the arrival height, followed by SOM. K-means was found to be the least depended clustering technique on the arrival height. The air quality management applications of these results in relation to PM10 concentrations recorded in Athens, Greece, were also discussed. Differences of PM10 concentrations, during certain clusters, were found statistically different (at 95% confidence level) indicating that these clusters appear to be associated with long-range transportation of particulates. This study can improve the interpretation of modelled atmospheric trajectories, leading to a more reliable analysis of synoptic weather circulation patterns and their impacts on urban air quality.
Moni, Chrysanthi
2014-01-01
The main purpose of this study is the correction for the energy losses of the e± in the tran- sition region between the barrel and the end-caps of the Electromagnetic Calorimeter (EMCal) of ATLAS, by using Multivariate techniques. The crack region is the one with the largest amount of material upstream the EMCal and this is the reason for which e± lose a great part of their energy as they pass through it. In this project, the contribution of the Multivariate Analysis in the correction of the E/Etrue distribution as well as in the derivation of the Gaussian peak versus |η| and ET , is examined. η is the pseudorapidity used as a spatial coordinate for the description of the angle of a particle relative to the beam axis and ET= Etrue /cosh(|η|), where Etrue is the true energy of the particles. Finally, the improvement of the resolution by using MVA techniques with and without scintillator is also explored.
Hamer, Harold A.; Mayer, John P.; Huston, Wilber B.
1961-01-01
Results of a statistical analysis of horizontal-tail loads on a fighter airplane are presented. The data were obtained from a number of operational training missions with flight at altitudes up to about 50,000 feet and at Mach numbers up to 1.22. The analysis was performed to determine the feasibility of calculating horizontal-tail load from data on the flight conditions and airplane motions. In the analysis the calculated loads are compared with the measured loads for the different types of missions performed. The loads were calculated by two methods: a direct approach and a Monte Carlo technique. The procedures used and some of the problems associated with the data analysis are discussed. frequencies of occurrence of tail loads of given magnitudes are derived from statistical information on the flight quantities. In the direct method, a time history of tail load is calculated from time-history measurements of the flight quantities. The Monte Carlo method could be useful for extending loads information for design of prospective airplanes . For the Monte Carlo method, the The results indicate that the accuracy of loads, regardless of the method used for calculation, is largely dependent on the knowledge of the pertinent airplane aerodynamic characteristics and center-of-gravity location. In addition, reliable Monte Carlo results require an adequate sample of statistical data and a knowledge of the more important statistical dependencies between the various flight conditions and airplane motions.
Gaitán Fernández, E.; García Moreno, R.; Pino Otín, M. R.; Ribalaygua Batalla, J.
2012-04-01
Climate and soil are two of the most important limiting factors for agricultural production. Nowadays climate change has been documented in many geographical locations affecting different cropping systems. The General Circulation Models (GCM) has become important tools to simulate the more relevant aspects of the climate expected for the XXI century in the frame of climatic change. These models are able to reproduce the general features of the atmospheric dynamic but their low resolution (about 200 Km) avoids a proper simulation of lower scale meteorological effects. Downscaling techniques allow overcoming this problem by adapting the model outcomes to local scale. In this context, FIC (Fundación para la Investigación del Clima) has developed a statistical downscaling technique based on a two step analogue methods. This methodology has been broadly tested on national and international environments leading to excellent results on future climate models. In a collaboration project, this statistical downscaling technique was applied to predict future scenarios for the grape growing systems in Spain. The application of such model is very important to predict expected climate for the different growing crops, mainly for grape, where the success of different varieties are highly related to climate and soil. The model allowed the implementation of agricultural conservation practices in the crop production, detecting highly sensible areas to negative impacts produced by any modification of climate in the different regions, mainly those protected with protected designation of origin, and the definition of new production areas with optimal edaphoclimatic conditions for the different varieties.
Directory of Open Access Journals (Sweden)
Land Walker H
2011-01-01
Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
The Statistical Analysis Techniques to Support the NGNP Fuel Performance Experiments
Energy Technology Data Exchange (ETDEWEB)
Bihn T. Pham; Jeffrey J. Einerson
2010-06-01
This paper describes the development and application of statistical analysis techniques to support the AGR experimental program on NGNP fuel performance. The experiments conducted in the Idaho National Laboratory’s Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel/graphite temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the SAS-based NGNP Data Management and Analysis System (NDMAS) for automated processing and qualification of the AGR measured data. The NDMAS also stores daily neutronic (power) and thermal (heat transfer) code simulation results along with the measurement data, allowing for their combined use and comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the target quantity (fuel temperature) within a given range.
Controlling intrinsic alignments in weak lensing statistics: The nulling and boosting techniques
Joachimi, B
2010-01-01
The intrinsic alignment of galaxies constitutes the major astrophysical source of systematic errors in surveys of weak gravitational lensing by the large-scale structure. We discuss the principles, summarise the implementation, and highlight the performance of two model-independent methods that control intrinsic alignment signals in weak lensing data: the nulling technique which eliminates intrinsic alignments to ensure unbiased constraints on cosmology, and the boosting technique which extracts intrinsic alignments and hence allows one to further study this contribution. Making only use of the characteristic dependence on redshift of the signals, both approaches are robust, but reduce the statistical power due to the similar redshift scaling of intrinsic alignment and lensing signals.
Shaban, M; Urban, B; El Saadi, A; Faisal, M
2010-08-01
The limited water resources of Egypt lead to widespread water-stress. Consequently, the use of marginal water sources, such as agricultural drainage waters, provides one of the national feasible solutions to the problem. However, the marginal quality of the drainage waters may restrict their use. The objective of this research is to develop a tool for planning and managing the reuse of agricultural drainage water for irrigation in the Nile Delta. This is achieved by classifying the pollution levels of drainage water into several categories using a statistical clustering approach that may ensure simple but accurate information about the pollution levels and water characteristics at any point within the drainage system. The derived clusters are then visualized by using a Geographical Information System (GIS) to draw thematic maps based on the entire Nile Delta, thus making GIS as a decision support system. The obtained maps may assist the decision makers in managing and controlling pollution in the Nile Delta regions. The clustering process also provides an effective overview of those spots in the Nile Delta where intensified monitoring activities are required. Consequently, the obtained results make a major contribution to the assessment and redesign of the Egyptian national water quality monitoring network.
Performance of Statistical Temporal Downscaling Techniques of Wind Speed Data Over Aegean Sea
Gokhan Guler, Hasan; Baykal, Cuneyt; Ozyurt, Gulizar; Kisacik, Dogan
2016-04-01
Wind speed data is a key input for many meteorological and engineering applications. Many institutions provide wind speed data with temporal resolutions ranging from one hour to twenty four hours. Higher temporal resolution is generally required for some applications such as reliable wave hindcasting studies. One solution to generate wind data at high sampling frequencies is to use statistical downscaling techniques to interpolate values of the finer sampling intervals from the available data. In this study, the major aim is to assess temporal downscaling performance of nine statistical interpolation techniques by quantifying the inherent uncertainty due to selection of different techniques. For this purpose, hourly 10-m wind speed data taken from 227 data points over Aegean Sea between 1979 and 2010 having a spatial resolution of approximately 0.3 degrees are analyzed from the National Centers for Environmental Prediction (NCEP) The Climate Forecast System Reanalysis database. Additionally, hourly 10-m wind speed data of two in-situ measurement stations between June, 2014 and June, 2015 are considered to understand effect of dataset properties on the uncertainty generated by interpolation technique. In this study, nine statistical interpolation techniques are selected as w0 (left constant) interpolation, w6 (right constant) interpolation, averaging step function interpolation, linear interpolation, 1D Fast Fourier Transform interpolation, 2nd and 3rd degree Lagrange polynomial interpolation, cubic spline interpolation, piecewise cubic Hermite interpolating polynomials. Original data is down sampled to 6 hours (i.e. wind speeds at 0th, 6th, 12th and 18th hours of each day are selected), then 6 hourly data is temporally downscaled to hourly data (i.e. the wind speeds at each hour between the intervals are computed) using nine interpolation technique, and finally original data is compared with the temporally downscaled data. A penalty point system based on
Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad
2015-11-01
One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.
Institute of Scientific and Technical Information of China (English)
盛婷婷; 张俊丽
2011-01-01
在本文中，主要在{0}函数中讨论了含二个卷积核对偶型奇异积分方程可解性Noether定理与相应的可解条件，在相应可解条件满足时，给出了一般解的显式．%This paper, based on the courses teaching of multivariate statistical analysis of statistical professional in our school, attempts to discuss teaching philosophy, teaching mode and teaching evaluation and puts forward some teaching methods, including analogy induction, case teaching and multiple test teaching.
Institute of Scientific and Technical Information of China (English)
刘银萍; 安丽微
2011-01-01
This paper, based on the courses teaching of multivariate statistical analysis of statistical professional in our school, attempts to discuss teaching philosophy, teaching mode and teaching evaluation and puts forward some teaching methods, including analogy induction, case teaching and multiple test teaching.%本文针对我校统计学专业多元统计分析课程的教学，从教学理念、教学模式及教学评价几个方面进行了探讨，提出了类比归纳、案例教学及多元测试教学法．
Energy Technology Data Exchange (ETDEWEB)
Pilot, Justin R. [Ohio State U.
2011-01-01
We present a search for the Standard Model Higgs Boson using the process $ZH\\to\\mu^+\\mu^- b\\bar{b}$. We use a dataset corresponding to 9.2 fb$^{-1}$ of integrated luminosity from proton-antiproton collisions with center-of-mass energy 1.96 TeV at the Fermilab Tevatron, collected with the CDF II detector. This analysis benefits from several new multivariate techniques that have not been used in previous analyses at CDF. We use a multivariate function to select muon candidates, increasing signal acceptance while simultaneously keeping fake rates small. We employ an inclusive trigger selection to further increase acceptance. To enhance signal discrimination, we utilize a multi-layer approach consisting of expert discriminants. This multi-layer discriminant method helps isolate the two main classes of background events, $t\\bar{t}$ and $Z$+jets production. It also includes a flavor separator, to distinguish light flavor jets from jets consistent with the decay of a $B$-hadron. Wit h this novel multi-layer approach, we proceed to set limits on the $ZH$ production cross section times branching ratio. For a Higgs boson with mass 115 GeV/$c^2$, we observe (expect) a limit of 8.0 (4.9) times the Standard Model prediction.
Samadi-Maybodi, Abdolraouf; Darzi, S. K. Hassani Nejad
2008-10-01
Resolution of binary mixtures of vitamin B12, methylcobalamin and B12 coenzyme with minimum sample pre-treatment and without analyte separation has been successfully achieved by methods of partial least squares algorithm with one dependent variable (PLS1), orthogonal signal correction/partial least squares (OSC/PLS), principal component regression (PCR) and hybrid linear analysis (HLA). Data of analysis were obtained from UV-vis spectra. The UV-vis spectra of the vitamin B12, methylcobalamin and B12 coenzyme were recorded in the same spectral conditions. The method of central composite design was used in the ranges of 10-80 mg L -1 for vitamin B12 and methylcobalamin and 20-130 mg L -1 for B12 coenzyme. The models refinement procedure and validation were performed by cross-validation. The minimum root mean square error of prediction (RMSEP) was 2.26 mg L -1 for vitamin B12 with PLS1, 1.33 mg L -1 for methylcobalamin with OSC/PLS and 3.24 mg L -1 for B12 coenzyme with HLA techniques. Figures of merit such as selectivity, sensitivity, analytical sensitivity and LOD were determined for three compounds. The procedure was successfully applied to simultaneous determination of three compounds in synthetic mixtures and in a pharmaceutical formulation.
Karunathilaka, Sanjeewa R; Kia, Ali-Reza Fardin; Srigley, Cynthia; Chung, Jin Kyu; Mossoba, Magdi M
2016-10-01
A rapid tool for evaluating authenticity was developed and applied to the screening of extra virgin olive oil (EVOO) retail products by using Fourier-transform near infrared (FT-NIR) spectroscopy in combination with univariate and multivariate data analysis methods. Using disposable glass tubes, spectra for 62 reference EVOO, 10 edible oil adulterants, 20 blends consisting of EVOO spiked with adulterants, 88 retail EVOO products and other test samples were rapidly measured in the transmission mode without any sample preparation. The univariate conformity index (CI) and the multivariate supervised soft independent modeling of class analogy (SIMCA) classification tool were used to analyze the various olive oil products which were tested for authenticity against a library of reference EVOO. Better discrimination between the authentic EVOO and some commercial EVOO products was observed with SIMCA than with CI analysis. Approximately 61% of all EVOO commercial products were flagged by SIMCA analysis, suggesting that further analysis be performed to identify quality issues and/or potential adulterants. Due to its simplicity and speed, FT-NIR spectroscopy in combination with multivariate data analysis can be used as a complementary tool to conventional official methods of analysis to rapidly flag EVOO products that may not belong to the class of authentic EVOO. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Introductory statistical inference
Mukhopadhyay, Nitis
2014-01-01
This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist
MacLean, Adam L; Harrington, Heather A; Stumpf, Michael P H; Byrne, Helen M
2016-01-01
The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely used approach in systems biology and medicine and continue to offer great potential. We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non-exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine.
MacLean, Adam L.
2015-12-16
The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely