Multivariate statistical methods a primer
Manly, Bryan FJ
2004-01-01
THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o
Multivariate statistical methods a first course
Marcoulides, George A
2014-01-01
Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Multivariate methods and forecasting with IBM SPSS statistics
Aljandali, Abdulkader
2017-01-01
This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...
Multivariate Statistical Process Control Process Monitoring Methods and Applications
Ge, Zhiqiang
2013-01-01
Given their key position in the process control industry, process monitoring techniques have been extensively investigated by industrial practitioners and academic control researchers. Multivariate statistical process control (MSPC) is one of the most popular data-based methods for process monitoring and is widely used in various industrial areas. Effective routines for process monitoring can help operators run industrial processes efficiently at the same time as maintaining high product quality. Multivariate Statistical Process Control reviews the developments and improvements that have been made to MSPC over the last decade, and goes on to propose a series of new MSPC-based approaches for complex process monitoring. These new methods are demonstrated in several case studies from the chemical, biological, and semiconductor industrial areas. Control and process engineers, and academic researchers in the process monitoring, process control and fault detection and isolation (FDI) disciplines will be inter...
Classification of Specialized Farms Applying Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
Zuzana Hloušková
2017-01-01
Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.
2007-06-01
the observed system. Our research involved a comparative analysis of two multivariate statistical methods, the multivariate CUSUM (MCUSUM) and the...outbreaks. We found that, similar to results for the univariate CUSUM and EWMA, the directionally-sensitive MCUSUM and MEWMA perform very similarly. 14...SUBJECT TERMS Biosurveillance, Multivariate CUSUM , Multivariate EWMA, Statistical Process Control, Syndromic Surveillance 15. NUMBER OF PAGES
Review of robust multivariate statistical methods in high dimension.
Filzmoser, Peter; Todorov, Valentin
2011-10-31
General ideas of robust statistics, and specifically robust statistical methods for calibration and dimension reduction are discussed. The emphasis is on analyzing high-dimensional data. The discussed methods are applied using the packages chemometrics and rrcov of the statistical software environment R. It is demonstrated how the functions can be applied to real high-dimensional data from chemometrics, and how the results can be interpreted.
Multivariate Statistical Process Control
DEFF Research Database (Denmark)
Kulahci, Murat
2013-01-01
As sensor and computer technology continues to improve, it becomes a normal occurrence that we confront with high dimensional data sets. As in many areas of industrial statistics, this brings forth various challenges in statistical process control (SPC) and monitoring for which the aim...... is to identify “out-of-control” state of a process using control charts in order to reduce the excessive variation caused by so-called assignable causes. In practice, the most common method of monitoring multivariate data is through a statistic akin to the Hotelling’s T2. For high dimensional data with excessive...
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.
2012-01-01
PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator
Refining developmental coordination disorder subtyping with multivariate statistical methods
Directory of Open Access Journals (Sweden)
Lalanne Christophe
2012-07-01
Full Text Available Abstract Background With a large number of potentially relevant clinical indicators penalization and ensemble learning methods are thought to provide better predictive performance than usual linear predictors. However, little is known about how they perform in clinical studies where few cases are available. We used Random Forests and Partial Least Squares Discriminant Analysis to select the most salient impairments in Developmental Coordination Disorder (DCD and assess patients similarity. Methods We considered a wide-range testing battery for various neuropsychological and visuo-motor impairments which aimed at characterizing subtypes of DCD in a sample of 63 children. Classifiers were optimized on a training sample, and they were used subsequently to rank the 49 items according to a permuted measure of variable importance. In addition, subtyping consistency was assessed with cluster analysis on the training sample. Clustering fitness and predictive accuracy were evaluated on the validation sample. Results Both classifiers yielded a relevant subset of items impairments that altogether accounted for a sharp discrimination between three DCD subtypes: ideomotor, visual-spatial and constructional, and mixt dyspraxia. The main impairments that were found to characterize the three subtypes were: digital perception, imitations of gestures, digital praxia, lego blocks, visual spatial structuration, visual motor integration, coordination between upper and lower limbs. Classification accuracy was above 90% for all classifiers, and clustering fitness was found to be satisfactory. Conclusions Random Forests and Partial Least Squares Discriminant Analysis are useful tools to extract salient features from a large pool of correlated binary predictors, but also provide a way to assess individuals proximities in a reduced factor space. Less than 15 neuro-visual, neuro-psychomotor and neuro-psychological tests might be required to provide a sensitive and
Applied multivariate statistical analysis
Härdle, Wolfgang Karl
2015-01-01
Focusing on high-dimensional applications, this 4th edition presents the tools and concepts used in multivariate data analysis in a style that is also accessible for non-mathematicians and practitioners. It surveys the basic principles and emphasizes both exploratory and inferential statistics; a new chapter on Variable Selection (Lasso, SCAD and Elastic Net) has also been added. All chapters include practical exercises that highlight applications in different multivariate data analysis fields: in quantitative financial studies, where the joint dynamics of assets are observed; in medicine, where recorded observations of subjects in different locations form the basis for reliable diagnoses and medication; and in quantitative marketing, where consumers’ preferences are collected in order to construct models of consumer behavior. All of these examples involve high to ultra-high dimensions and represent a number of major fields in big data analysis. The fourth edition of this book on Applied Multivariate ...
Cherukupalle, Nirmala devi
This bibliography contains works that illustrate and apply multivariate statistical methods in the analysis of empirical data in the field of urban and regional planning. The bibliography has been designed for use by planning students and the professional planner. The first section of the bibliography lists some elementary and intermediate level…
Applied multivariate statistics with R
Zelterman, Daniel
2015-01-01
This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...
Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A
2012-03-15
To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Beguería, S.; Lorente, A.
2007-01-01
This paper, written as a deliverable of the DAMOCLES project, is a review of the different existing methodologies to landslide hazard mapping by multivariate statistics. Within the DAMOCLES project, multivariate statistical models have been applied to different study regions in Italy and Spain. The
Defining the ecological hydrology of Taiwan Rivers using multivariate statistical methods
Chang, Fi-John; Wu, Tzu-Ching; Tsai, Wen-Ping; Herricks, Edwin E.
2009-09-01
SummaryThe identification and verification of ecohydrologic flow indicators has found new support as the importance of ecological flow regimes is recognized in modern water resources management, particularly in river restoration and reservoir management. An ecohydrologic indicator system reflecting the unique characteristics of Taiwan's water resources and hydrology has been developed, the Taiwan ecohydrological indicator system (TEIS). A major challenge for the water resources community is using the TEIS to provide environmental flow rules that improve existing water resources management. This paper examines data from the extensive network of flow monitoring stations in Taiwan using TEIS statistics to define and refine environmental flow options in Taiwan. Multivariate statistical methods were used to examine TEIS statistics for 102 stations representing the geographic and land use diversity of Taiwan. The Pearson correlation coefficient showed high multicollinearity between the TEIS statistics. Watersheds were separated into upper and lower-watershed locations. An analysis of variance indicated significant differences between upstream, more natural, and downstream, more developed, locations in the same basin with hydrologic indicator redundancy in flow change and magnitude statistics. Issues of multicollinearity were examined using a Principal Component Analysis (PCA) with the first three components related to general flow and high/low flow statistics, frequency and time statistics, and quantity statistics. These principle components would explain about 85% of the total variation. A major conclusion is that managers must be aware of differences among basins, as well as differences within basins that will require careful selection of management procedures to achieve needed flow regimes.
A primer of multivariate statistics
Harris, Richard J
2014-01-01
Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why
von Larcher, Thomas; Harlander, Uwe; Alexandrov, Kiril; Wang, Yongtai
2010-05-01
Experiments on baroclinic wave instabilities in a rotating cylindrical gap have been long performed, e.g., to unhide regular waves of different zonal wave number, to better understand the transition to the quasi-chaotic regime, and to reveal the underlying dynamical processes of complex wave flows. We present the application of appropriate multivariate data analysis methods on time series data sets acquired by the use of non-intrusive measurement techniques of a quite different nature. While the high accurate Laser-Doppler-Velocimetry (LDV ) is used for measurements of the radial velocity component at equidistant azimuthal positions, a high sensitive thermographic camera measures the surface temperature field. The measurements are performed at particular parameter points, where our former studies show that kinds of complex wave patterns occur [1, 2]. Obviously, the temperature data set has much more information content as the velocity data set due to the particular measurement techniques. Both sets of time series data are analyzed by using multivariate statistical techniques. While the LDV data sets are studied by applying the Multi-Channel Singular Spectrum Analysis (M - SSA), the temperature data sets are analyzed by applying the Empirical Orthogonal Functions (EOF ). Our goal is (a) to verify the results yielded with the analysis of the velocity data and (b) to compare the data analysis methods. Therefor, the temperature data are processed in a way to become comparable to the LDV data, i.e. reducing the size of the data set in such a manner that the temperature measurements would imaginary be performed at equidistant azimuthal positions only. This approach initially results in a great loss of information. But applying the M - SSA to the reduced temperature data sets enable us to compare the methods. [1] Th. von Larcher and C. Egbers, Experiments on transitions of baroclinic waves in a differentially heated rotating annulus, Nonlinear Processes in Geophysics
Chen, Zhe; Qiu, Zurong; Huo, Xinming; Fan, Yuming; Li, Xinghua
2017-03-01
A fiber-capacitive drop analyzer is an instrument which monitors a growing droplet to produce a capacitive opto-tensiotrace (COT). Each COT is an integration of fiber light intensity signals and capacitance signals and can reflect the unique physicochemical property of a liquid. In this study, we propose a solution analytical and concentration quantitative method based on multivariate statistical methods. Eight characteristic values are extracted from each COT. A series of COT characteristic values of training solutions at different concentrations compose a data library of this kind of solution. A two-stage linear discriminant analysis is applied to analyze different solution libraries and establish discriminant functions. Test solutions can be discriminated by these functions. After determining the variety of test solutions, Spearman correlation test and principal components analysis are used to filter and reduce dimensions of eight characteristic values, producing a new representative parameter. A cubic spline interpolation function is built between the parameters and concentrations, based on which we can calculate the concentration of the test solution. Methanol, ethanol, n-propanol, and saline solutions are taken as experimental subjects in this paper. For each solution, nine or ten different concentrations are chosen to be the standard library, and the other two concentrations compose the test group. By using the methods mentioned above, all eight test solutions are correctly identified and the average relative error of quantitative analysis is 1.11%. The method proposed is feasible which enlarges the applicable scope of recognizing liquids based on the COT and improves the concentration quantitative precision, as well.
Directory of Open Access Journals (Sweden)
Jiwen Ge
2013-07-01
Full Text Available To provide the reasonable basis for scientific management of water resources and certain directive significance for sustaining health of Gufu River and even maintaining the stability of water ecosystem of the Three-Gorge Reservoir of Yangtze River, central China, multiple statistical methods including Cluster Analysis (CA, Discriminant Analysis (DA and Principal Component Analysis (PCA were performed to assess the spatial-temporal variations and interpret water quality data. The data were obtained during one year (2010~2011 of monitoring of 13 parameters at 21 different sites (3003 observations, Hierarchical CA classified 11 months into 2 periods (the first and second periods and 21 sampling sites into 2 clusters, namely, respectively upper reaches with little anthropogenic interference (UR and lower reaches running through the farming areas and towns that are subjected to some human interference (LR of the sites, based on similarities in the water quality characteristics. Eight significant parameters (total phosphorus, total nitrogen, temperature, nitrate nitrogen, total organic carbon, total hardness, total alkalinity and silicon dioxide were identified by DA, affording 100% correct assignations for temporal variation analysis, and five significant parameters (total phosphorus, total nitrogen, ammonia nitrogen, electrical conductivity and total organic carbon were confirmed with 88% correct assignations for spatial variation analysis. PCA (varimax functionality was applied to identify potential pollution sources based on the two clustered regions. Four Principal Components (PCs with 91.19 and 80.57% total variances were obtained for the Upper Reaches (UR and Lower Reaches (LR regions, respectively. For the UR region, the rainfall runoff, soil erosion, scouring weathering of crustal materials and forest areas are the main sources of pollution. The pollution sources for the LR region are anthropogenic sources (domestic and agricultural runoff
Multivariate statistics exercises and solutions
Härdle, Wolfgang Karl
2015-01-01
The authors present tools and concepts of multivariate data analysis by means of exercises and their solutions. The first part is devoted to graphical techniques. The second part deals with multivariate random variables and presents the derivation of estimators and tests for various practical situations. The last part introduces a wide variety of exercises in applied multivariate data analysis. The book demonstrates the application of simple calculus and basic multivariate methods in real life situations. It contains altogether more than 250 solved exercises which can assist a university teacher in setting up a modern multivariate analysis course. All computer-based exercises are available in the R language. All R codes and data sets may be downloaded via the quantlet download center www.quantlet.org or via the Springer webpage. For interactive display of low-dimensional projections of a multivariate data set, we recommend GGobi.
Valle, Denis; Baiser, Benjamin; Woodall, Christopher W; Chazdon, Robin
2014-12-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates of uncertainty. We illustrate our method using tree data for the eastern United States and from a tropical successional chronosequence. The model is able to detect pervasive declines in the oak community in Minnesota and Indiana, potentially due to fire suppression, increased growing season precipitation and herbivory. The chronosequence analysis is able to delineate clear successional trends in species composition, while also revealing that site-specific factors significantly impact these successional trajectories. The proposed method provides a means to decompose and track the dynamics of species assemblages along temporal and spatial gradients, including effects of global change and forest disturbances.
Aspects of multivariate statistical theory
Muirhead, Robb J
2009-01-01
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "". . . the wealth of material on statistics concerning the multivariate normal distribution is quite exceptional. As such it is a very useful source of information for the general statistician and a must for anyone wanting to pen
Banoeng-Yakubo, B.; Yidana, S.M.; Nti, E.
2009-01-01
Q and R-mode multivariate statistical analyses were applied to groundwater chemical data from boreholes and wells in the northern section of the Volta region Ghana. The objective was to determine the processes that affect the hydrochemistry and the variation of these processes in space among the three main geological terrains: the Buem formation, Voltaian System and the Togo series that underlie the area. The analyses revealed three zones in the groundwater flow system: recharge, intermediate and discharge regions. All three zones are clearly different with respect to all the major chemical parameters, with concentrations increasing from the perceived recharge areas through the intermediate regions to the discharge areas. R-mode HCA and factor analysis (using varimax rotation and Kaiser Criterion) were then applied to determine the significant sources of variation in the hydrochemistry. This study finds that groundwater hydrochemistry in the area is controlled by the weathering of silicate and carbonate minerals, as well as the chemistry of infiltrating precipitation. This study finds that the ??D and ??18O data from the area fall along the Global Meteoric Water Line (GMWL). An equation of regression derived for the relationship between ??D and ??18O bears very close semblance to the equation which describes the GMWL. On the basis of this, groundwater in the study area is probably meteoric and fresh. The apparently low salinities and sodicities of the groundwater seem to support this interpretation. The suitability of groundwater for domestic and irrigation purposes is related to its source, which determines its constitution. A plot of the sodium adsorption ratio (SAR) and salinity (EC) data on a semilog axis, suggests that groundwater serves good irrigation quality in the area. Sixty percent (60%), 20% and 20% of the 67 data points used in this study fall within the medium salinity - low sodicity (C2-S1), low salinity -low sodicity (C1-S1) and high salinity - low
Schmidt decomposition and multivariate statistical analysis
Bogdanov, Yu. I.; Bogdanova, N. A.; Fastovets, D. V.; Luckichev, V. F.
2016-12-01
The new method of multivariate data analysis based on the complements of classical probability distribution to quantum state and Schmidt decomposition is presented. We considered Schmidt formalism application to problems of statistical correlation analysis. Correlation of photons in the beam splitter output channels, when input photons statistics is given by compound Poisson distribution is examined. The developed formalism allows us to analyze multidimensional systems and we have obtained analytical formulas for Schmidt decomposition of multivariate Gaussian states. It is shown that mathematical tools of quantum mechanics can significantly improve the classical statistical analysis. The presented formalism is the natural approach for the analysis of both classical and quantum multivariate systems and can be applied in various tasks associated with research of dependences.
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
Directory of Open Access Journals (Sweden)
Đula Borozan
2014-03-01
Full Text Available The paper deals with the application of multivariate analysis of variance and logistic regression in measuring, explaining and evaluating (i gender differences in expressing migration aspirations, and (ii a gender effect on migration motivation of university students in Croatia. The results supported the thesis that migration is a complex gendering process that assumes subjective assessment of the whole set of interrelated motives. According to logistic regression, gender is a significant predictor of migration aspirations among the selected demographic and socio-economic variables. A multivariate analysis of variance showed that gender and migration aspirations in interaction matter when it comes to migration motives, particularly related to the perceived importance of social networks. Females, and especially those who aspire to migrate, assessed these motives as more important than males.
Chemical indices and methods of multivariate statistics as a tool for odor classification.
Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd
2007-04-01
Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.
Lifshits, A M
1979-01-01
General characteristics of the multivariate statistical analysis (MSA) is given. Methodical premises and criteria for the selection of an adequate MSA method applicable to pathoanatomic investigations of the epidemiology of multicausal diseases are presented. The experience of using MSA with computors and standard computing programs in studies of coronary arteries aterosclerosis on the materials of 2060 autopsies is described. The combined use of 4 MSA methods: sequential, correlational, regressional, and discriminant permitted to quantitate the contribution of each of the 8 examined risk factors in the development of aterosclerosis. The most important factors were found to be the age, arterial hypertension, and heredity. Occupational hypodynamia and increased fatness were more important in men, whereas diabetes melitus--in women. The registration of this combination of risk factors by MSA methods provides for more reliable prognosis of the likelihood of coronary heart disease with a fatal outcome than prognosis of the degree of coronary aterosclerosis.
Multivariate analysis: A statistical approach for computations
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Detecting seasonal cycle shift on streamflow over Turkey by using multivariate statistical methods
Yildiz, Dogan; Gunes, Mehmet Samil; Gokalp Yavuz, Fulya; Yildiz, Dursun
2017-08-01
Climate change analysis includes the study of several types of variables such as temperature, precipitation, carbon emission, and streamflow. In this study, we focus on basin hydrology and, in particular, on streamflow values. They are geographic and climatologic indicators utilized in the study of basins. We analyze these values to better understand monthly and seasonal change over a 40-year period for all basins in Turkey. Our study differs from others by applying multivariate analysis into the streamflow data implementations rather than on trend, frequency, and/or distribution-based analysis. The characteristics of basins and climate change effects are visualized and examined with monthly data by using cluster analysis, multidimensional scaling, and gCLUTO (graphical Clustering Toolkit). As a result, we classify months as low-flow and high-flow periods. Multidimensional scaling proves that there is a clockwise movement of months from one decade to the next, which is the indicator of seasonal shift. Finally, the gCLUTO tool is utilized in a novel way in the hydrology field by revealing the seasonal change and visualizing the current changing structure of streamflow.
Tavakol, Mitra; Arjmandi, Reza; Shayeghi, Mansoureh; Monavari, Seyed Masoud; Karbassi, Abdolreza
2017-01-01
One of the key issues in determining the quality of water in rivers is to create a water quality control network with a suitable performance. The measured qualitative variables at stations should be representative of all the changes in water quality in water systems. Since the increase in water quality monitoring stations increases annual monitoring costs, recognition of the stations with higher importance as well as main parameters can be effective in future decisions to improve the existing monitoring network. Sampling was carried out on 12 physical and chemical parameters measured at 15 stations during 2013-2014 in Haraz River, northern Iran. The results of the measurements were analyzed using multivariate statistical analysis methods including cluster analysis (CA), principal component analysis (PCA), factor analysis (FA), and discriminant analysis (DA). According to the CA, PCA, and FA, the stations were divided into three groups of high pollution, medium pollution, and low pollution. The research findings confirm applicability of multivariate statistical techniques in the interpretation of large data sets, water quality assessment, and source apportionment of different pollution sources.
Methods of Multivariate Analysis
Rencher, Alvin C
2012-01-01
Praise for the Second Edition "This book is a systematic, well-written, well-organized text on multivariate analysis packed with intuition and insight . . . There is much practical wisdom in this book that is hard to find elsewhere."-IIE Transactions Filled with new and timely content, Methods of Multivariate Analysis, Third Edition provides examples and exercises based on more than sixty real data sets from a wide variety of scientific fields. It takes a "methods" approach to the subject, placing an emphasis on how students and practitioners can employ multivariate analysis in real-life sit
A method of using cluster analysis to study statistical dependence in multivariate data
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
The application of multivariate statistical methods for understanding food consumer behaviour
Lakner, Zoltan; Hajdu, Istvanne; Banati, Diana; Szabo, Erzsebet; Kasza, Gyula
2007-01-01
Understanding consumer behaviour is a necessary precondition for a targeted communication strategy. The behaviour is a complex phenomenon and research needs to undertake a rigorously apply sophisticated methods. This article entails the combined utilisation of categorical principal component analysis and cluster analysis to determine the major, relatively homogenous consumer groups and this is coupled with confirmatory factor analysis and structural model building to understand consumer behav...
DEFF Research Database (Denmark)
Ludvigsen, Liselotte; Albrechtsen, Hans-Jørgen; Rootzén, Helle
1997-01-01
Different multivariate statistical analyses were applied to phospholipid fatty acids representing the biomass composition and to different biogeochemical parameters measured in 37 samples from a landfill contaminated aquifer at Grindsted Landfill (Denmark). Principal component analysis and corres......Different multivariate statistical analyses were applied to phospholipid fatty acids representing the biomass composition and to different biogeochemical parameters measured in 37 samples from a landfill contaminated aquifer at Grindsted Landfill (Denmark). Principal component analysis....... Partial least square analysis related the phospholipid fatty acids data to the biogeochemical parameters assuming linear relationships. After selection of the optimal phospholipid fatty acid combination by genetic algorithms, good partial least squares models with low prediction errors were gained...
Kemperman, Ramses F. J.; Horvatovich, Peter L.; Hoekman, Berend; Reijmers, Theo H.; Muskiet, Frits A. J.; Bischoff, Rainer
2007-01-01
We describe a platform for the comparative profiling of urine using reversed-phase liquid chromatography-mass spectrometry (LC-MS) and multivariate statistical data analysis. Urinary compounds were separated by gradient elution and subsequently detected by electrospray Ion-Trap MS. The lower limit o
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality.
Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning
Izenman, Alan Julian
2006-01-01
Describes the advances in computation and data storage that led to the introduction of many statistical tools for high-dimensional data analysis. Focusing on multivariate analysis, this book discusses nonlinear methods as well as linear methods. It presents an integrated mixture of classical and modern multivariate statistical techniques.
Chen, Jiabo; Li, Fayun; Fan, Zhiping; Wang, Yanjie
2016-01-01
Source apportionment of river water pollution is critical in water resource management and aquatic conservation. Comprehensive application of various GIS-based multivariate statistical methods was performed to analyze datasets (2009–2011) on water quality in the Liao River system (China). Cluster analysis (CA) classified the 12 months of the year into three groups (May–October, February–April and November–January) and the 66 sampling sites into three groups (groups A, B and C) based on similarities in water quality characteristics. Discriminant analysis (DA) determined that temperature, dissolved oxygen (DO), pH, chemical oxygen demand (CODMn), 5-day biochemical oxygen demand (BOD5), NH4+–N, total phosphorus (TP) and volatile phenols were significant variables affecting temporal variations, with 81.2% correct assignments. Principal component analysis (PCA) and positive matrix factorization (PMF) identified eight potential pollution factors for each part of the data structure, explaining more than 61% of the total variance. Oxygen-consuming organics from cropland and woodland runoff were the main latent pollution factor for group A. For group B, the main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter. For group C, the evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics. PMID:27775679
Raji, M A; Frycák, P; Temiyasathit, C; Kim, S B; Mavromaras, G; Ahn, J-M; Schug, K A
2009-07-01
Response factors were determined for twelve GXG peptides (where G stands for glycine and X is any of alanine [A], arginine [R], asparagine [N], aspartic acid [D], glycine [G], histidine [H], leucine [L], lysine [K], phenylalanine [F], serine [S], tyrosine [Y], valine [V]) by electrospray ionization mass spectrometry (ESI-MS). The response factors were measured using a novel flow injection method. This new method is based on the Gaussian distribution of analyte concentration resulting from band-broadening dispersion experienced by the analyte upon passage through an extended volume of PEEK tubing. This method removes the need for preparing a discrete series of standard solutions to assess concentration-dependent response. Relative response factors were calculated for each peptide with reference to GGG. The observed trends in the relative response factors were correlated with several analyte physicochemical parameters, chosen based on current understanding of ion release from charged droplets during the ESI process. These include analyte properties: nonpolar surface area; polar surface area; gas-phase basicity; proton affinity; and Log D. Multivariate statistical analysis using multiple linear regression, decision tree, and support vector regression models were investigated to assess their potential for predicting ESI response based on the analyte properties. The support vector regression model was more versatile and produced the least predictive error following 12-fold cross-validation. The effect of variation in solution pH on the relative response factors is highlighted, as evidenced by the different predictive models obtained for peptide response at two pH values (pH = 6.0 and 9.0). The relationship between physicochemical parameters and associated ionization efficiencies for GXG tripeptides is discussed based on the equilibrium partitioning model. Copyright 2009 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Weili Duan
2016-01-01
Full Text Available Multivariate statistical methods including cluster analysis (CA, discriminant analysis (DA and component analysis/factor analysis (PCA/FA, were applied to explore the surface water quality datasets including 14 parameters at 28 sites of the Eastern Poyang Lake Basin, Jiangxi Province of China, from January 2012 to April 2015, characterize spatiotemporal variation in pollution and identify potential pollution sources. The 28 sampling stations were divided into two periods (wet season and dry season and two regions (low pollution and high pollution, respectively, using hierarchical CA method. Four parameters (temperature, pH, ammonia-nitrogen (NH4-N, and total nitrogen (TN were identified using DA to distinguish temporal groups with close to 97.86% correct assignations. Again using DA, five parameters (pH, chemical oxygen demand (COD, TN, Fluoride (F, and Sulphide (S led to 93.75% correct assignations for distinguishing spatial groups. Five potential pollution sources including nutrients pollution, oxygen consuming organic pollution, fluorine chemical pollution, heavy metals pollution and natural pollution, were identified using PCA/FA techniques for both the low pollution region and the high pollution region. Heavy metals (Cuprum (Cu, chromium (Cr and Zinc (Zn, fluoride and sulfide are of particular concern in the study region because of many open-pit copper mines such as Dexing Copper Mine. Results obtained from this study offer a reasonable classification scheme for low-cost monitoring networks. The results also inform understanding of spatio-temporal variation in water quality as these topics relate to water resources management.
Beketov, Mikhail A; Kattwinkel, Mira; Liess, Matthias
2013-12-01
The identification of the effects of toxicants on biological communities is hampered by the complexity and variability of communities. To overcome these challenges, the trait-based SPEAR approach has been developed. This approach is based on (i) identifying the vulnerable taxa using traits and (ii) aggregating these taxa into a group to reduce the between-replicate differences and scattered low-abundance distribution, both of which are typical for biological communities. This approach allows for reduction of the noise and determination of the effects of toxicants at low concentrations in both field and mesocosm studies. However, there is a need to quantitatively investigate its potential for mesocosm data evaluations and application in the ecological risk assessment of toxicants. In the present study, we analysed how the aggregation of the sensitive taxa can facilitate the identification of the effects. We used empirical data from a long-term mesocosm experiment with stream invertebrates and an insecticide as well as a series of simulated datasets characterised by different degrees of data matrix saturation (corresponding to different sampling efforts), numbers of replicates, and between-replicate differences. The analyses of both the empirical and simulated data sets revealed that the taxa aggregation approach allows for the detection of effects at a lower saturation of the data matrices, smaller number of replicates, and higher between-replicate differences when compared to the multivariate statistical method redundancy analysis. These improvements lead to a higher sensitivity of the analysed systems, as long-term effects were detected at lower concentrations (up to 1,000 times). These outcomes suggest that methods based on taxa aggregation have a strong potential for use in mesocosm data evaluations because mesocosm studies are usually poorly replicated, have high between-replicate variability, and cannot be exhaustively sampled due to technical and financial
Random matrix theory and multivariate statistics
Diaz-Garcia, Jose A.; Jáimez, Ramon Gutiérrez
2009-01-01
Some tools and ideas are interchanged between random matrix theory and multivariate statistics. In the context of the random matrix theory, classes of spherical and generalised Wishart random matrix ensemble, containing as particular cases the classical random matrix ensembles, are proposed. Some properties of these classes of ensemble are analysed. In addition, the random matrix ensemble approach is extended and a unified theory proposed for the study of distributions for real normed divisio...
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Szulc, Stefan
1965-01-01
Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then
Freund, Rudolf J; Wilson, William J
2010-01-01
Statistical Methods, 3e provides students with a working introduction to statistical methods offering a wide range of applications that emphasize the quantitative skills useful across many academic disciplines. This text takes a classic approach emphasizing concepts and techniques for working out problems and intepreting results. The book includes research projects, real-world case studies, numerous examples and data exercises organized by level of difficulty. This text requires that a student be familiar with algebra. New to this edition: NEW expansion of exercises a
Institute of Scientific and Technical Information of China (English)
Xia Gao; Yu-Ling Ma; Pei Zhang; Xiao-Ping Zheng; Bo-Lu Sun; Fang-Di Hu
2016-01-01
Background: The dried roots of Inula helenium L. (IH) and Inula racemosa Hook f. (IR) are used commonly as folk medicine under the name of “tumuxiang (TMX)”. Phenolic acid compounds and their derivatives, as main active constituents in IH and IR, exhibit prominent anti-inflammation effect. Objective: To develop a holistic method based on chemical characteristic and anti-inflammation effect for systematically evaluating the quality of twenty-seven TMX samples (including 18 IH samples and 9 IR samples) from different origins. Methods: HPLC fingerprints data of AL (Aucklandia lappa Decne.) whose dried root was similar with HR was added for classification analysis. The HPLC fingerprints of twenty-seven TMX samples and four AL samples were evaluated using hierarchical clustering analysis (HCA) and principle component analysis (PCA). The spectrum-efficacy model between HPLC fingerprints and anti-inflammatory activities was investigated by principal component regression (PCR) and partial least squares(PLS). Results: All samples were successfully divided into three main clusters and peaks 7, 9, 11, 22, 24 and 26 had a primary contribution to classify these medicinal herbs. The results were in accord with the appraisal results of herbs. The spectrum-efficacy relationship results indicated that citric acid, quinic acid, caffeic acid-β-D-glucopyranoside, chlorogenic acid, caffeic acid, 1,3-O-dicaffeoyl quinic acid, tianshic acid and 3β-Hydroxypterondontic acid had main contribution to anti-inflammatory activities. Conclusion: This comprehensive strategy was successfully used for identification of IH, IR and AL, which provided a reliable and adequate theoretical basis for the bioactivity relevant quality standards and studying the material basis of anti-inflammatory effect of TMX.
Multivariate statistical analysis of wildfires in Portugal
Costa, Ricardo; Caramelo, Liliana; Pereira, Mário
2013-04-01
Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).
Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W
2013-02-01
Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.
Multivariate statistics and the enactment of metabolic complexity.
Levin, Nadine
2014-08-01
This ethnographic study, based on fieldwork at the Computational and Systems Medicine laboratory at Imperial College London, shows how researchers in the field of metabolomics--the post-genomic study of the molecules and processes that make up metabolism--enact and coproduce complex views of biology with multivariate statistics. From this data-driven science, metabolism emerges as a multiple, informational and statistical object, which is both produced by and also necessitates particular forms of data production and analysis. Multivariate statistics emerge as 'natural' and 'correct' ways of engaging with a metabolism that is made up of many variables. In this sense, multivariate statistics allow researchers to engage with and conceptualize metabolism, and also disease and processes of life, as complex entities. Consequently, this article builds on studies of scientific practice and visualization to examine data as material objects rather than black-boxed representations. Data practices are not merely the technological components of experimentation, but are simultaneously technologies and methods and are intertwined with ways of seeing and enacting the biological world. Ultimately, this article questions the increasing invocation and role of complexity within biology, suggesting that discourses of complexity are often imbued with reductionist and determinist ways of thinking about biology, as scientists engage with complexity in calculated and controlled, but also limited, ways.
Statistical Methods for Astronomy
Feigelson, Eric D
2012-01-01
This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spati...
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure.
Multivariate statistical analysis a high-dimensional approach
Serdobolskii, V
2000-01-01
In the last few decades the accumulation of large amounts of in formation in numerous applications. has stimtllated an increased in terest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a de ficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimat ing the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivari ate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process 'multi-collinear data' and there are no theoretical recommen ...
Multivariate Statistical Analysis Applied in Wine Quality Evaluation
Directory of Open Access Journals (Sweden)
Jieling Zou
2015-08-01
Full Text Available This study applies multivariate statistical approaches to wine quality evaluation. With 27 red wine samples, four factors were identified out of 12 parameters by principal component analysis, explaining 89.06% of the total variance of data. As iterative weights calculated by the BP neural network revealed little difference from weights determined by information entropy method, the latter was chosen to measure the importance of indicators. Weighted cluster analysis performs well in classifying the sample group further into two sub-clusters. The second cluster of red wine samples, compared with its first, was lighter in color, tasted thinner and had fainter bouquet. Weighted TOPSIS method was used to evaluate the quality of wine in each sub-cluster. With scores obtained, each sub-cluster was divided into three grades. On the whole, the quality of lighter red wine was slightly better than the darker category. This study shows the necessity and usefulness of multivariate statistical techniques in both wine quality evaluation and parameter selection.
Multivariate Relationships between Statistics Anxiety and Motivational Beliefs
Baloglu, Mustafa; Abbassi, Amir; Kesici, Sahin
2017-01-01
In general, anxiety has been found to be associated with motivational beliefs and the current study investigated multivariate relationships between statistics anxiety and motivational beliefs among 305 college students (60.0% women). The Statistical Anxiety Rating Scale, the Motivated Strategies for Learning Questionnaire, and a set of demographic…
Relationship between Multiple Regression and Selected Multivariable Methods.
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Directory of Open Access Journals (Sweden)
Vetrimurugan Elumalai
2017-04-01
Full Text Available Heavy metals in surface and groundwater were analysed and their sources were identified using multivariate statistical tools for two towns in South Africa. Human exposure risk through the drinking water pathway was also assessed. Electrical conductivity values showed that groundwater is desirable to permissible for drinking except for six locations. Concentration of aluminium, lead and nickel were above the permissible limit for drinking at all locations. Boron, cadmium, iron and manganese exceeded the limit at few locations. Heavy metal pollution index based on ten heavy metals indicated that 85% of the area had good quality water, but 15% was unsuitable. Human exposure dose through the drinking water pathway indicated no risk due to boron, nickel and zinc, moderate risk due to cadmium and lithium and high risk due to silver, copper, manganese and lead. Hazard quotients were high in all sampling locations for humans of all age groups, indicating that groundwater is unsuitable for drinking purposes. Highly polluted areas were located near the coast, close to industrial operations and at a landfill site representing human-induced pollution. Factor analysis identified the four major pollution sources as: (1 industries; (2 mining and related activities; (3 mixed sources- geogenic and anthropogenic and (4 fertilizer application.
Statistical Inference for a Class of Multivariate Negative Binomial Distributions
DEFF Research Database (Denmark)
Rubak, Ege H.; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...
Multicomponent seismic noise attenuation with multivariate order statistic filters
Wang, Chao; Wang, Yun; Wang, Xiaokai; Xun, Chao
2016-10-01
The vector relationship between multicomponent seismic data is highly important for multicomponent processing and interpretation, but this vector relationship could be damaged when each component is processed individually. To overcome the drawback of standard component-by-component filtering, multivariate order statistic filters are introduced and extended to attenuate the noise of multicomponent seismic data by treating such dataset as a vector wavefield rather than a set of scalar fields. According to the characteristics of seismic signals, we implement this type of multivariate filtering along local events. First, the optimal local events are recognized according to the similarity between the vector signals which are windowed from neighbouring seismic traces with a sliding time window along each trial trajectory. An efficient strategy is used to reduce the computational cost of similarity measurement for vector signals. Next, one vector sample each from the neighbouring traces are extracted along the optimal local event as the input data for a multivariate filter. Different multivariate filters are optimal for different noise. The multichannel modified trimmed mean (MTM) filter, as one of the multivariate order statistic filters, is applied to synthetic and field multicomponent seismic data to test its performance for attenuating white Gaussian noise. The results indicate that the multichannel MTM filter can attenuate noise while preserving the relative amplitude information of multicomponent seismic data more effectively than a single-channel filter.
Analysis on Maize Agronomic traits by Multivariate Statistical Method%应用多元统计分析玉米农艺性状
Institute of Scientific and Technical Information of China (English)
谭贤杰; 覃兰秋; 廖金秀; 周锦国; 江禹奉; 谢和霞; 程伟东; 吴子恺
2011-01-01
在作物遗传育种研究中,产量及其相关性状多属于数量性状范畴,这类性状由多基因控制,易受环境影响使其遗传极为复杂,并且性状间常存在复杂相关关系.产量及其相关性状间的复杂关系使得育种中对以产量为目标的选择极为困难.多元统计分析是研究客观事物中多个变量之间相互依赖的统计规律性综合分析方法.合理利用多元统计分析可以加深对性状间相互关系的遗传规律及各相关性状对产量影响的主次和依存关系认识,为新品种选育和改良提供理论依据.对35个玉米品种(组合)的20个农艺性状应用GGE双标图、因子分析和聚类分析研究,结果表明,平均日产量、千粒重、穗长与产量呈显著正相关;20个农艺性状可综合为6个公因子;以6个公因子为综合指标对35个品种(组合)聚类结果聚成17个类群,其中G8、G14、G12、G10和G17为综合性状优良品种(组合).%In crop breeding, yield and yield related traits are mostly quantitative traits. These traits are controlled by multiple genes and apt to affect by environment, the relationship between traits arecomplex. The complex relationship between yield and yield related traits are hindrances to crops breeding for yield as target trait. Multivariate statistical analysis is a comprehensive and powerful tool for multivariate statistical analysis, it has been comprehensively applied in genetic breeding for discovering the discipline and the major and minor relationship of trait heredity. In this study,20 agronomic traits of 35 maize varieties (combination) were analyzed by GGE biplot, and carried out factor analysis and cluster analysis. The results showed that: (1) yield per day, kilo grain weight and ear length were significantly correlated with yield. (2)20 agronomic traits could be consolidated into six factors. (3)35 varieties clustered into 17 groups scoring by 6 factors. (4) variety (combination) G 8, G 14, G 12, G
Statistical analysis of multivariate atmospheric variables. [cloud cover
Tubbs, J. D.
1979-01-01
Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.
Multivariate statistical analysis of precipitation chemistry in Northwestern Spain
Energy Technology Data Exchange (ETDEWEB)
Prada-Sanchez, J.M.; Garcia-Jurado, I.; Gonzalez-Manteiga, W.; Fiestras-Janeiro, M.G.; Espada-Rios, M.I.; Lucas-Dominguez, T. (University of Santiago, Santiago (Spain). Faculty of Mathematics, Dept. of Statistics and Operations Research)
1993-07-01
149 samples of rainwater were collected in the proximity of a power station in northwestern Spain at three rainwater monitoring stations. The resulting data are analyzed using multivariate statistical techniques. Firstly, the Principal Component Analysis shows that there are three main sources of pollution in the area (a marine source, a rural source and an acid source). The impact from pollution from these sources on the immediate environment of the stations is studied using Factorial Discriminant Analysis. 8 refs., 7 figs., 11 tabs.
Institute of Scientific and Technical Information of China (English)
Dou Lei; Zhou Yongzhang; Ma Jin; Li Yong; Cheng Qiuming; Xie Shuyun; Du Haiyan; You Yuanhang; Wan Hongfu
2008-01-01
Dongguan (东莞) City, located in the Pearl River Delta, South China, is famous for its rapid industrialization in the past 30 years. A total of 90 topsoil samples have been collected from agricultural fields, including vegetable and orchard soils in the city, and eight heavy metals (As, Cu, Cd,Cr, Hg, Ni, Pb, and Zn) and other items (pH values and organic matter) have been analyzed, to evaluate the influence of anthropie activities on the environmental quality of agricultural soils and to identify the spatial distribution of trace elements and possible sources of trace elements. The elements Hg, Pb, and Cd have accumulated remarkably here, incomparison with the soil background content of elements in Guangdong (广东) Province. Pollution is more serious in the western plain and the central region, which are heavily distributed with industries and rivers. Multivariate and geostatistical methods have been applied to differentiate the influences of natural processes and human activities on the pollution of heavy metals in topsoils in the study area. The results of cluster analysis (CA) and factor analysis (FA) show that Ni, Cr, Cu, Zn, and As are grouped in factor F1,Pb in F2, and Cd and Hg in F3, respectively. The spatial pattern of the three factors may be well demonstrated by geostatistical analysis. It is shown that the first factor could be considered as a natural source controlled by parent rocks. The second factor could be referred to as "industrial and traffic pollution sources". The source of the third factor is mainly controlled by long-term anthropic activities ,ad a consequence of agricultural fossil fuel consumption and atmospheric deposition.
Adjustment of geochemical background by robust multivariate statistics
Zhou, D.
1985-01-01
Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.
Statistical inference for a class of multivariate negative binomial distributions
DEFF Research Database (Denmark)
Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on α-permanental randomfields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results for α...
Liu, Na; Li, Jun; Li, Bao-Guo
2014-11-01
The study of quality control of Chinese medicine has always been the hot and the difficulty spot of the development of traditional Chinese medicine (TCM), which is also one of the key problems restricting the modernization and internationalization of Chinese medicine. Multivariate statistical analysis is an analytical method which is suitable for the analysis of characteristics of TCM. It has been used widely in the study of quality control of TCM. Multivariate Statistical analysis was used for multivariate indicators and variables that appeared in the study of quality control and had certain correlation between each other, to find out the hidden law or the relationship between the data can be found,.which could apply to serve the decision-making and realize the effective quality evaluation of TCM. In this paper, the application of multivariate statistical analysis in the quality control of Chinese medicine was summarized, which could provided the basis for its further study.
Directory of Open Access Journals (Sweden)
Lyubov V. Ruchinskaya
2013-01-01
Full Text Available Methodological and methodical basis of the developed methods of the multivariate statistical analysis of consumers’ preferences at the Russian market of cultured milk products is considered. The author carried out segmentation of consumers of the cultured milk production based on methods of multidimensional classification and allowing optimizing structure of production of milk production by domestic producers.
Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies
Directory of Open Access Journals (Sweden)
Qiong Yang
2012-01-01
Full Text Available Multivariate phenotypes are frequently encountered in genetic association studies. The purpose of analyzing multivariate phenotypes usually includes discovery of novel genetic variants of pleiotropy effects, that is, affecting multiple phenotypes, and the ultimate goal of uncovering the underlying genetic mechanism. In recent years, there have been new method development and application of existing statistical methods to such phenotypes. In this paper, we provide a review of the available methods for analyzing association between a single marker and a multivariate phenotype consisting of the same type of components (e.g., all continuous or all categorical or different types of components (e.g., some are continuous and others are categorical. We also reviewed causal inference methods designed to test whether the detected association with the multivariate phenotype is truly pleiotropy or the genetic marker exerts its effects on some phenotypes through affecting the others.
Robust methods for multivariate data analysis A1
DEFF Research Database (Denmark)
Frosch, Stina; Von Frese, J.; Bro, Rasmus
2005-01-01
Outliers may hamper proper classical multivariate analysis, and lead to incorrect conclusions. To remedy the problem of outliers, robust methods are developed in statistics and chemometrics. Robust methods reduce or remove the effect of outlying data points and allow the ?good? data to primarily...
Multivariate Statistical Modelling of Drought and Heat Wave Events
Manning, Colin; Widmann, Martin; Vrac, Mathieu; Maraun, Douglas; Bevaqua, Emanuele
2016-04-01
Multivariate Statistical Modelling of Drought and Heat Wave Events C. Manning1,2, M. Widmann1, M. Vrac2, D. Maraun3, E. Bevaqua2,3 1. School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK 2. Laboratoire des Sciences du Climat et de l'Environnement, (LSCE-IPSL), Centre d'Etudes de Saclay, Gif-sur-Yvette, France 3. Wegener Center for Climate and Global Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria Compound extreme events are a combination of two or more contributing events which in themselves may not be extreme but through their joint occurrence produce an extreme impact. Compound events are noted in the latest IPCC report as an important type of extreme event that have been given little attention so far. As part of the CE:LLO project (Compound Events: muLtivariate statisticaL mOdelling) we are developing a multivariate statistical model to gain an understanding of the dependence structure of certain compound events. One focus of this project is on the interaction between drought and heat wave events. Soil moisture has both a local and non-local effect on the occurrence of heat waves where it strongly controls the latent heat flux affecting the transfer of sensible heat to the atmosphere. These processes can create a feedback whereby a heat wave maybe amplified or suppressed by the soil moisture preconditioning, and vice versa, the heat wave may in turn have an effect on soil conditions. An aim of this project is to capture this dependence in order to correctly describe the joint probabilities of these conditions and the resulting probability of their compound impact. We will show an application of Pair Copula Constructions (PCCs) to study the aforementioned compound event. PCCs allow in theory for the formulation of multivariate dependence structures in any dimension where the PCC is a decomposition of a multivariate distribution into a product of bivariate components modelled using copulas. A
Processes and subdivisions in diogenites, a multivariate statistical analysis
Harriott, T. A.; Hewins, R. H.
1984-01-01
Multivariate statistical techniques used on diogenite orthopyroxene analyses show the relationships that occur within diogenites and the two orthopyroxenite components (class I and II) in the polymict diogenite Garland. Cluster analysis shows that only Peckelsheim is similar to Garland class I (Fe-rich) and the other diogenites resemble Garland class II. The unique diogenite Y 75032 may be related to type I by fractionation. Factor analysis confirms the subdivision and shows that Fe does not correlate with the weakly incompatible elements across the entire pyroxene composition range, indicating that igneous fractionation is not the process controlling total diogenite composition variation. The occurrence of two groups of diogenites is interpreted as the result of sampling or mixing of two main sequences of orthopyroxene cumulates with slightly different compositions.
Forensic discrimination of dyed hair color: II. Multivariate statistical analysis.
Barrett, Julie A; Siegel, Jay A; Goodpaster, John V
2011-01-01
This research is intended to assess the ability of UV-visible microspectrophotometry to successfully discriminate the color of dyed hair. Fifty-five red hair dyes were analyzed and evaluated using multivariate statistical techniques including agglomerative hierarchical clustering (AHC), principal component analysis (PCA), and discriminant analysis (DA). The spectra were grouped into three classes, which were visually consistent with different shades of red. A two-dimensional PCA observations plot was constructed, describing 78.6% of the overall variance. The wavelength regions associated with the absorbance of hair and dye were highly correlated. Principal components were selected to represent 95% of the overall variance for analysis with DA. A classification accuracy of 89% was observed for the comprehensive dye set, while external validation using 20 of the dyes resulted in a prediction accuracy of 75%. Significant color loss from successive washing of hair samples was estimated to occur within 3 weeks of dye application.
Arslan, Hakan; Ayyildiz Turan, Nazlı
2015-08-01
Monitoring of heavy metal concentrations in groundwater potentially used for drinking and irrigation is very important. This study collected groundwater samples from 78 wells in July 2012 and analyzed them for 17 heavy metals (Pb, Zn, Cr, Mn, Fe, Cu, Cd, Co, Ni, Al, As, Mo, Se, B, Ti, V, Ba). Spatial distributions of these elements were identified using three different interpolation methods [inverse distance weighing (IDW), radial basis function (RBF), and ordinary kriging (OK)]. Root mean squared error (RMSE) and mean absolute error (MAE) for cross validation were used to select the best interpolation methods for each parameter. Multivariate statistical analysis [cluster analysis (CA) and factor analysis (FA)] were used to identify similarities among sampling sites and the contribution of variables to groundwater pollution. Fe and Mn levels exceeded World Health Organization (WHO) recommended limits for drinking water in almost all of the study area, and some locations had Fe and Mn levels that exceeded Food and Agriculture Organization (FAO) guidelines for drip irrigation systems. Al, As, and Cd levels also exceeded WHO guidelines for drinking water. Cluster analysis classified groundwater in the study area into three groups, and factor analysis identified five factors that explained 73.39% of the total variation in groundwater, which are as follows: factor 1: Se, Ti, Cr, Mo; factor 2: Ni, Mn, Co, Ba; factor 3: Pb, Cd; factor 4: B, V, Fe, Cu; and factor 5: AS, Zn. As a result of this study, it could be said that interpolation methods and multivariate statistical techniques gave very useful results for the determination of the source.
Khani, Rouhollah; Ghasemi, Jahan B.; Shemirani, Farzaneh
2014-03-01
A powerful and efficient signal-preprocessing technique that combines local and multiscale properties of the wavelet prism with the global filtering capability of orthogonal signal correction (OSC) is applied for pretreatment of spectroscopic data of parabens as model compounds after their preconcentration by robust ionic liquid-based dispersive liquid-liquid microextraction method (IL-DLLME). In the proposed technique, a mixture of a water-immiscible ionic liquid (as extraction solvent) [Hmim][PF6] and disperser solvent is injected into an aqueous sample solution containing one of the IL's ions, NaPF6, as extraction solvent and common ion source. After preconcentration, the absorbance of the extracted compounds was measured in the wavelength range of 200-700 nm. The wavelet orthogonal signal correction with partial least squares (WOSC-PLS) method was then applied for simultaneous determination of each individual compound. Effective parameters, such as amount of IL, volume of the disperser solvent and amount of NaPF6, were inspected by central composite design to identify the most important parameters and their interactions. The effect of pH on the sensitivity and selectivity was studied according to the net analyte signal (NAS) for each component. Under optimum conditions, enrichment factors of the studied compounds were 75 for methyl paraben (MP) and 71 for propyl paraben (PP). Limits of detection for MP and PP were 4.2 and 4.8 ng mL-1, respectively. The root mean square errors of prediction for MP and PP were 0.1046 and 0.1275 μg mL-1, respectively. The practical applicability of the developed method was examined using hygienic, cosmetic, pharmaceutical and natural water samples.
Gap Shape Classification using Landscape Indices and Multivariate Statistics
Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung
2016-11-01
This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.
Institute of Scientific and Technical Information of China (English)
王燕飞
2015-01-01
In this paper, the principal component analysis method is used to reduce the number of enterprise recruitment indexes, and the indexes are merged into 4 aspects. The decision-making model of enterprise recruitment is established according to analytic hierarchy process. It is provided to the theoretical basis for the selection of outstanding personnel.%本文利用主成分分析法对企业招聘指标进行降维，将12项指标合并为4个方面。利用层次分析法建立企业招聘的决策模型，为企业选拔优秀人员提供理论依据。
Statistical methods in astronomy
Long, James P.; de Souza, Rafael S.
2017-01-01
We present a review of data types and statistical methods often encountered in astronomy. The aim is to provide an introduction to statistical applications in astronomy for statisticians and computer scientists. We highlight the complex, often hierarchical, nature of many astronomy inference problems and advocate for cross-disciplinary collaborations to address these challenges.
Reuter, M; Netter, P
2001-01-01
The present study proposes a hierarchical multivariate statistical prediction model which enables to determine the most prominent variables (physiological, biochemical and personality factors) related to nicotine craving and dopaminergic activation. Based on animal studies reporting a reduction of the rewarding effects of psychotropic drugs after blockade or destruction of the mesolimbic dopamine (DA) system, changes in nicotine craving after pharmacological manipulation by means of a DA agonist (lisuride 0.2 mg) and a DA antagonist (fluphenazine 2 mg) were assessed in 36 healthy male heavy smokers. The major aim was the development of a multivariate prediction model which is applicable in samples lacking variance homogeneity or the prerequisite of a multivariate normal distribution. The model proposed is a combination of multivariate parametric and nonparametric methods taking advantage of their individual merits. Especially personality variables, such as sensation seeking, impulsivity, and neuroticism showed to be important predictors of craving in this responder approach.
Multivariate statistical modelling based on generalized linear models
Fahrmeir, Ludwig
1994-01-01
This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Ordinary chondrites - Multivariate statistical analysis of trace element contents
Lipschutz, Michael E.; Samuels, Stephen M.
1991-01-01
The contents of mobile trace elements (Co, Au, Sb, Ga, Se, Rb, Cs, Te, Bi, Ag, In, Tl, Zn, and Cd) in Antarctic and non-Antarctic populations of H4-6 and L4-6 chondrites, were compared using standard multivariate discriminant functions borrowed from linear discriminant analysis and logistic regression. A nonstandard randomization-simulation method was developed, making it possible to carry out probability assignments on a distribution-free basis. Compositional differences were found both between the Antarctic and non-Antarctic H4-6 chondrite populations and between two L4-6 chondrite populations. It is shown that, for various types of meteorites (in particular, for the H4-6 chondrites), the Antarctic/non-Antarctic compositional difference is due to preterrestrial differences in the genesis of their parent materials.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.
Ma, Yan; Mazumdar, Madhu
2011-10-30
Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting.
STATISTICAL METHODS IN HISTORY
Directory of Open Access Journals (Sweden)
Orlov A. I.
2016-01-01
Full Text Available We have given a critical analysis of statistical models and methods for processing text information in historical records to establish the times when there were certain events, ie, to build science-based chronology. There are three main kinds of sources of knowledge of ancient history: ancient texts, the remains of material culture and traditions. The specific date of the extracted by archaeologists objects in most cases can not be found. The group of Academician A.T. Fomenko has developed and applied new statistical methods for analysis of historical texts (Chronicle, based on the intensive use of computer technology. Two major scientific results were: the majority of historical records that we know now, are duplicated (in particular, chronicles, describing the so-called "Ancient Rome" and "Middle Ages", talking about the same events; the known historical chronicles tell us about real events, separated from the present time for not more than 1000 years. It was found that chronicles describing the history of "ancient times" and "Middle Ages" and the chronicle of Chinese history and the history of various European countries do not talk about different, but about the same events. We have the attempt of a new dating of historical events and restoring the true history of human society based on new data. From the standpoint of statistical methods of historical records and images of their fragments – they are special cases of non-numeric objects of nature. Therefore, developed by the group of A.T. Fomenko computer-statistical methods are the part of non-numerical statistics. We have considered some methods of statistical analysis of chronicles applied by the group of A.T. Fomenko: correlation method of maximums; dynasties method; the method of attenuation frequency; questionnaire method codes. New chronology allows us to understand much of the battle of ideas in modern science and mass consciousness. It becomes clear the root cause of cautious
Kotula, Paul G; Keenan, Michael R
2006-12-01
Multivariate statistical analysis methods have been applied to scanning transmission electron microscopy (STEM) energy-dispersive X-ray spectral images. The particular application of the multivariate curve resolution (MCR) technique provides a high spectral contrast view of the raw spectral image. The power of this approach is demonstrated with a microelectronics failure analysis. Specifically, an unexpected component describing a chemical contaminant was found, as well as a component consistent with a foil thickness change associated with the focused ion beam specimen preparation process. The MCR solution is compared with a conventional analysis of the same spectral image data set.
Digital Repository Service at National Institute of Oceanography (India)
Jayalakshmy, K.V.; Rao, K.K.
A study of planktonic foraminiferal assemblages from 19 stations in the neritic and oceanic regions off the Coromandel Coast, Bay of Bengal has been made using a multivariate statistical method termed as factor analysis. On the basis of abundance...
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-05-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Classification of Malaysia aromatic rice using multivariate statistical analysis
Energy Technology Data Exchange (ETDEWEB)
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A. [School of Mechatronic Engineering, Universiti Malaysia Perlis, Kampus Pauh Putra, 02600 Arau, Perlis (Malaysia); Omar, O. [Malaysian Agriculture Research and Development Institute (MARDI), Persiaran MARDI-UPM, 43400 Serdang, Selangor (Malaysia)
2015-05-15
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.
2014-12-01
Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.
Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies.
Yang, Qiong; Wang, Yuanjia
2012-05-01
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Multivariate phenotypes are frequently encountered in genetic association studies. The purpose of analyzing multivariate phenotypes usually includes discovery of novel genetic variants of pleiotropy effects, that is, affecting multiple phenotypes, and the ultimate goal of uncovering the underlying genetic mechanism. In recent years, there have been new method development and application of existing statistical methods to such phenotypes. In this paper, we provide a review of the available methods for analyzing association between a single marker and a multivariate phenotype consisting of the same type of components (e.g., all continuous or all categorical) or different types of components (e.g., some are continuous and others are categorical). We also reviewed causal inference methods designed to test whether the detected association with the multivariate phenotype is truly pleiotropy or the genetic marker exerts its effects on some phenotypes through affecting the others.
Liu, Yingchun; Sun, Guoxiang; Wang, Yan; Yang, Lanping; Yang, Fangliang
2015-06-01
Micellar electrokinetic chromatography fingerprinting combined with quantification was successfully developed and applied to monitor the quality consistency of Weibizhi tablets, which is a classical compound preparation used to treat gastric ulcers. A background electrolyte composed of 57 mmol/L sodium borate, 21 mmol/L sodium dodecylsulfate and 100 mmol/L sodium hydroxide was used to separate compounds. To optimize capillary electrophoresis conditions, multivariate statistical analyses were applied. First, the most important factors influencing sample electrophoretic behavior were identified as background electrolyte concentrations. Then, a Box-Benhnken design response surface strategy using resolution index RF as an integrated response was set up to correlate factors with response. RF reflects the effective signal amount, resolution, and signal homogenization in an electropherogram, thus, it was regarded as an excellent indicator. In fingerprint assessments, simple quantified ratio fingerprint method was established for comprehensive quality discrimination of traditional Chinese medicines/herbal medicines from qualitative and quantitative perspectives, by which the quality of 27 samples from the same manufacturer were well differentiated. In addition, the fingerprint-efficacy relationship between fingerprints and antioxidant activities was established using partial least squares regression, which provided important medicinal efficacy information for quality control. The present study offered an efficient means for monitoring Weibizhi tablet quality consistency.
Energy Technology Data Exchange (ETDEWEB)
Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com
2009-11-15
The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.
Joint multivariate statistical model and its applications to the synthetic earthquake prediction
Institute of Scientific and Technical Information of China (English)
韩天锡; 蒋淳; 魏雪丽; 韩梅; 冯德益
2004-01-01
Considering the problems that should be solved in the synthetic earthquake prediction at present, a new model is proposed in the paper. It is called joint multivariate statistical model combined by principal component analysis with discriminatory analysis. Principal component analysis and discriminatory analysis are very important theories in multivariate statistical analysis that has developed quickly in the late thirty years. By means of maximization information method, we choose several earthquake prediction factors whose cumulative proportions of total sample variances are beyond 90% from numerous earthquake prediction factors. The paper applies regression analysis and Mahalanobis discrimination to extrapolating synthetic prediction. Furthermore, we use this model to characterize and predict earthquakes in North China (30°～42°N, 108°～125°E) and better prediction results are obtained.
Multivariate semiparametric spatial methods for imaging data.
Chen, Huaihou; Cao, Guanqun; Cohen, Ronald A
2017-04-01
Univariate semiparametric methods are often used in modeling nonlinear age trajectories for imaging data, which may result in efficiency loss and lower power for identifying important age-related effects that exist in the data. As observed in multiple neuroimaging studies, age trajectories show similar nonlinear patterns for the left and right corresponding regions and for the different parts of a big organ such as the corpus callosum. To incorporate the spatial similarity information without assuming spatial smoothness, we propose a multivariate semiparametric regression model with a spatial similarity penalty, which constrains the variation of the age trajectories among similar regions. The proposed method is applicable to both cross-sectional and longitudinal region-level imaging data. We show the asymptotic rates for the bias and covariance functions of the proposed estimator and its asymptotic normality. Our simulation studies demonstrate that by borrowing information from similar regions, the proposed spatial similarity method improves the efficiency remarkably. We apply the proposed method to two neuroimaging data examples. The results reveal that accounting for the spatial similarity leads to more accurate estimators and better functional clustering results for visualizing brain atrophy pattern.Functional clustering; Longitudinal magnetic resonance imaging (MRI); Penalized B-splines; Region of interest (ROI); Spatial penalty.
Biostatistics Series Module 10: Brief Overview of Multivariate Methods.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.
Statistical methods for forecasting
Abraham, Bovas
2009-01-01
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists."This book, it must be said, lives up to the words on its advertising cover: ''Bridging the gap between introductory, descriptive approaches and highly advanced theoretical treatises, it provides a practical, intermediate level discussion of a variety of forecasting tools, and explains how they relate to one another, both in theory and practice.'' It does just that!"-Journal of the Royal Statistical Society"A well-written work that deals with statistical methods and models that can be used to produce short-term forecasts, this book has wide-ranging applications. It could be used in the context of a study of regression, forecasting, and time series ...
Wang, Yi; Ma, Xiang; Wen, Ya-Dong; Zou, Quan; Wang, Jun; Tu, Jia-Run; Cai, Wen-Sheng; Shao, Xue-Guang
2013-05-01
Near infrared diffusive reflectance spectroscopy has been applied in on-site or on-line analysis due to its characteristics of fastness, non-destruction and the feasibility for real complex sample analysis. The present work reported a real-time monitoring method for industrial production by using near infrared spectroscopic technique and multivariate statistical process analysis. In the method, the real-time near infrared spectra of the materials are collected on the production line, and then the evaluation of the production process can be achieved by a statistic Hotelling T2 calculated with the established model. In this work, principal component analysis (PCA) is adopted for building the model, and the statistic is calculated by projecting the real-time spectra onto the PCA model. With an application of the method in a practical production, it was demonstrated that a real-time evaluation of the variations in the production can be realized by investigating the changes in the statistic, and the comparison of the products in different batches can be achieved by further statistics of the statistic. Therefore, the proposed method may provide a practical way for quality insurance of production processes.
A statistical approach for segregating cognitive task stages from multivariate fMRI BOLD time series
Directory of Open Access Journals (Sweden)
Charmaine eDemanuele
2015-10-01
Full Text Available Multivariate pattern analysis can reveal new information from neuroimaging data to illuminate human cognition and its disturbances. Here, we develop a methodological approach, based on multivariate statistical/machine learning and time series analysis, to discern cognitive processing stages from fMRI blood oxygenation level dependent (BOLD time series. We apply this method to data recorded from a group of healthy adults whilst performing a virtual reality version of the delayed win-shift radial arm maze task. This task has been frequently used to study working memory and decision making in rodents. Using linear classifiers and multivariate test statistics in conjunction with time series bootstraps, we show that different cognitive stages of the task, as defined by the experimenter, namely, the encoding/retrieval, choice, reward and delay stages, can be statistically discriminated from the BOLD time series in brain areas relevant for decision making and working memory. Discrimination of these task stages was significantly reduced during poor behavioral performance in dorsolateral prefrontal cortex (DLPFC, but not in the primary visual cortex (V1. Experimenter-defined dissection of time series into class labels based on task structure was confirmed by an unsupervised, bottom-up approach based on Hidden Markov Models. Furthermore, we show that different groupings of recorded time points into cognitive event classes can be used to test hypotheses about the specific cognitive role of a given brain region during task execution. We found that whilst the DLPFC strongly differentiated between task stages associated with different memory loads, but not between different visual-spatial aspects, the reverse was true for V1. Our methodology illustrates how different aspects of cognitive information processing during one and the same task can be separated and attributed to specific brain regions based on information contained in multivariate patterns of voxel
Analysis/forecast experiments with a multivariate statistical analysis scheme using FGGE data
Baker, W. E.; Bloom, S. C.; Nestler, M. S.
1985-01-01
A three-dimensional, multivariate, statistical analysis method, optimal interpolation (OI) is described for modeling meteorological data from widely dispersed sites. The model was developed to analyze FGGE data at the NASA-Goddard Laboratory of Atmospherics. The model features a multivariate surface analysis over the oceans, including maintenance of the Ekman balance and a geographically dependent correlation function. Preliminary comparisons are made between the OI model and similar schemes employed at the European Center for Medium Range Weather Forecasts and the National Meteorological Center. The OI scheme is used to provide input to a GCM, and model error correlations are calculated for forecasts of 500 mb vertical water mixing ratios and the wind profiles. Comparisons are made between the predictions and measured data. The model is shown to be as accurate as a successive corrections model out to 4.5 days.
Ghanate, A D; Kothiwale, S; Singh, S P; Bertrand, Dominique; Krishna, C Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Kruger, Uwe
2012-01-01
The development and application of multivariate statistical techniques in process monitoring has gained substantial interest over the past two decades in academia and industry alike. Initially developed for monitoring and fault diagnosis in complex systems, such techniques have been refined and applied in various engineering areas, for example mechanical and manufacturing, chemical, electrical and electronic, and power engineering. The recipe for the tremendous interest in multivariate statistical techniques lies in its simplicity and adaptability for developing monitoring applica
Institute of Scientific and Technical Information of China (English)
Bundit Boonkhao; Xue Z. Wang
2012-01-01
Ultrasonic attenuation spectroscopy (UAS) is an attractive process analytical technology (PAT) for on-line real-time characterisation of slurries for particle size distribution (PSD) estimation.It is however only applicable to relatively low solid concentrations since existing instrument process models still cannot fully take into account the phenomena of particle-particle interaction and multiple scattering,leading to errors in PSD estimation.This paper investigates an alternative use of the raw attenuation spectra for direct multivariate statistical process control (MSPC).The UAS raw spectra were processed using principal component analysis.The selected principal components were used to derive two MSPC statistics,the Hotelling's T2 and square prediction error (SPE).The method is illustrated and demonstrated by reference to a wet milling process for processing nanoparticles.
Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A
2008-12-01
It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
Oprea, Cristiana; Gustova, Marina V; Oprea, Ioan A; Buzguta, Violeta L
2014-01-01
X-ray fluorescence spectrometry (XRFS) was used as a multielement method of evaluation of individual whole human tooth or tooth tissues for their amounts of trace elements. Measurements were carried out on human enamel, dentine, and dental cementum, and some differences in tooth matrix composition were noted. In addition, the elemental concentrations determined in teeth from subjects of different ages, nutritional states, professions and gender, living under various environmental conditions and dietary habits, were included in a comparison by multivariate statistical analysis (MVSA) methods. By factor analysis it was established that inorganic components of human teeth varied consistently with their source in the tissue, with more in such tissue from females than in that from males, and more in tooth incisor than in tooth molar.
DEFF Research Database (Denmark)
Arenas-Garcia, J.; Petersen, K.; Camps-Valls, G.
2013-01-01
sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of multivariate analysis (MVA). This article provides a uniform treatment of several methods: principal component analysis (PCA), partial least squares (PLS), canonical...
Overview of multivariate methods and their application to studies of wildlife habitat
Energy Technology Data Exchange (ETDEWEB)
Shugart, Jr., H. H.
1980-01-01
Multivariate statistical techniques as methods of choice in analyzing habitat relations among animals have distinct advantages over competitive methodologies. These considerations, joined with a reduction in the cost of computer time, the increased availability of multivariate statistical packages, and an increased willingness on the part of ecologists to use mathematics and statistics as tools, have created an exponentially increasing interest in multivariate statistical methods over the past decade. It is important to note that the earliest multivariate statistical analyses in ecology did more than introduce a set of appropriate and needed methodologies to ecology. The studies emphasized different spatial and organizational scales from those typically emphasized in habitat studies. The new studies, that used multivariate methods, emphasized individual organisms' responses in a heterogeneous environment. This philosophical (and to some degree, methodological) emphasis on heterogeneity has led to a potential to predict the consequences of disturbances and management on wildlife habitat. One recent development in this regard has been the coupling of forest succession simulators with multivariate analysis of habitat to predict habitat availability under different timber management procedures.
Statistical methods in nonlinear dynamics
Indian Academy of Sciences (India)
K P N Murthy; R Harish; S V M Satyanarayana
2005-03-01
Sensitivity to initial conditions in nonlinear dynamical systems leads to exponential divergence of trajectories that are initially arbitrarily close, and hence to unpredictability. Statistical methods have been found to be helpful in extracting useful information about such systems. In this paper, we review briefly some statistical methods employed in the study of deterministic and stochastic dynamical systems. These include power spectral analysis and aliasing, extreme value statistics and order statistics, recurrence time statistics, the characterization of intermittency in the Sinai disorder problem, random walk analysis of diffusion in the chaotic pendulum, and long-range correlations in stochastic sequences of symbols.
Energy Technology Data Exchange (ETDEWEB)
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mew, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2015-09-24
Attribution of the origin of an illicit drug relies on identification of compounds indicative of its clandestine production and is a key component of many modern forensic investigations. The results of these studies can yield detailed information on method of manufacture, starting material source, and final product - all critical forensic evidence. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic fentanyl, N-(1-phenylethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods, all previously published fentanyl synthetic routes or hybrid versions thereof, were studied in an effort to identify and classify route-specific signatures. 160 distinct compounds and inorganic species were identified using gas and liquid chromatographies combined with mass spectrometric methods (GC-MS and LCMS/ MS-TOF) in conjunction with inductively coupled plasma mass spectrometry (ICPMS). The complexity of the resultant data matrix urged the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 87 route-specific CAS were classified and a statistical model capable of predicting the method of fentanyl synthesis was validated and tested against CAS profiles from crude fentanyl products deposited and later extracted from two operationally relevant surfaces: stainless steel and vinyl tile. This work provides the most detailed fentanyl CAS investigation to date by using orthogonal mass spectral data to identify CAS of forensic significance for illicit drug detection, profiling, and attribution.
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Enlow, Elizabeth M; Kennedy, Jennifer L; Nieuwland, Alexander A; Hendrix, James E; Morgan, Stephen L
2005-08-01
Nylons are an important class of synthetic polymers, from an industrial, as well as forensic, perspective. A spectroscopic method, such as Fourier transform infrared (FT-IR) spectroscopy, is necessary to determine the nylon subclasses (e. g., nylon 6 or nylon 6,6). Library searching using absolute difference and absolute derivative difference algorithms gives inconsistent results for identifying nylon subclasses. The objective of this study was to evaluate the usefulness of peak ratio analysis and multivariate statistics for the identification of nylon subclasses using attenuated total reflection (ATR) spectral data. Many nylon subclasses could not be distinguished by the peak ratio of the N-H vibrational stretch to the sp(3) C-H(2) vibrational stretch intensities. Linear discriminant analysis, however, provided a graphical visualization of differences between nylon subclasses and was able to correctly classify a set of 270 spectra from eight different subclasses with 98.5% cross-validated accuracy.
Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes
Indian Academy of Sciences (India)
Tanushree Haldar; Saurabh Ghosh
2015-12-01
Clinical end-point traits are usually governed by quantitative precursors. Hence, there is active research interest in developing statistical methods for association mapping of quantitative traits. Unlike population-based tests for association, family-based tests for transmission disequilibrium are protected against population stratification. In this study, we propose a logistic regression model to test the association for quantitative traits based on a trio design. We show that the method can be viewed as a direct extension of the classical transmission diequilibrium test for binary traits to quantitative traits. We evaluate the performance of our method using extensive simulations and compare it with an existing method, family-based association test. We found that the two methods yield comparable powers if all families are considered. However, unlike FBAT, which yields an inflated rate of false positives when noninformative trios with all three individuals’ heterozygous are removed, our method maintains the correct size without compromising too much on power. We show that our method can be easily modified to incorporate multivariate phenotypes. Here, we applied this method to analyse a quantitative endophenotype associated with alcoholism.
Methods of statistical model estimation
Hilbe, Joseph
2013-01-01
Methods of Statistical Model Estimation examines the most important and popular methods used to estimate parameters for statistical models and provide informative model summary statistics. Designed for R users, the book is also ideal for anyone wanting to better understand the algorithms used for statistical model fitting. The text presents algorithms for the estimation of a variety of regression procedures using maximum likelihood estimation, iteratively reweighted least squares regression, the EM algorithm, and MCMC sampling. Fully developed, working R code is constructed for each method. Th
[Pathogenesis of temporomandibular dysfunction. II. Statistical method].
Vágó, P
1989-08-01
The variables of the epidemiologic assessments concerned with the aetiology of the mandible joint disfunction were examined in the course of statistical analyses, in general, in their pairwise connections and possibly a multi-variable linear regression calculation was employed. In the course of the examination, for establishing the linear, empirically tested model of the aetiology of the mandible joint disfunction a new type statistical method, the LISREL (Linear Structural Relationship) method was employed. An advantage of this assessment consists in that not only observed variables may figure as the variables of the structural equation but also latent variables which cannot be observed but it is supposable that they are factors of the observed variables. This statistical method is described in closer details in the article in connection with the forming of the aetiological model.
Institute of Scientific and Technical Information of China (English)
梁军; 钱积新
2003-01-01
Multivariate statistical process monitoring and control (MSPM& C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares (PLS) are surveyed in this paper,The four-step procedure of performing MSPM &C for chemical process ,modeling of processes ,detecting abnormal events or faults,identifying the variable(s) responible for the faults and diagnosing the source cause for the abnormal behavior,is analyzed,Several main research directions of MSPM&C reported in the literature are discussed,such as multi-way principal component analysis (MPCA) for batch process ,statistical monitoring and control for nonlinear process,dynamic PCA and dynamic PLS,and on -line quality control by infer-ential models,Industrial applications of MSPM&C to several typical chemical processes ,such as chemical reactor,distillation column,polymeriztion process ,petroleum refinery units,are summarized,Finally,some concluding remarks and future considerations are made.
Set Correlation as a General Multivariate Data-Analytic Method.
Cohen, Jacob
1982-01-01
Set correlation is a multivariate generalization of multiple regression/correlation analysis that features the employment of overall measures of association interpretable as proportions of variance and the use of set-partialled sets of variables. The statistical development of the theory and several examples are presented. (Author/JKS)
Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Multivariate statistical analysis of radioactive variables in two phosphate ores from Sudan.
Adam, Abdel Majid A; Eltayeb, Mohamed Ahmed H
2012-05-01
Multivariate statistical techniques are efficient ways to display complex relationships among many objects. An attempt was made to study the radioactive data in two types of Sudanese phosphate deposits; Kurun and Uro phosphate, using several multivariate statistical methods. Pearson correlation coefficient revealed that a U-238 distribution in Kurun phosphate is controlled by the variation of K-40 concentration, whereas in Uro phosphate it is controlled by the variation of U-235 and U-234 concentration. Histograms and normal Q-Q plots clearly show that the radioactive variables did not follow a normal distribution. This non-normality feature observed may be attributed to complicating influence of geological factors. The principal components analysis (PCA) gives a model of five components for representing the acquired data from Kurun phosphate, where 89.5% of the total variance is explained. A model of four components was sufficient to represent the acquired data from Uro phosphate, where 87.5% of the total data variance is explained. The hierarchical cluster analysis (HCA) indicates that U-238 behaves in the same manner in the two types of phosphates; it associated with a group of four radionuclides; U-234, Po-210, Ra-226, Th-230, which the most abundant radionuclides, and all belong to the uranium-238 decay series. Two parameters have been adapted for the direct differentiate between the two phosphates. Firstly, U-238 in Uro phosphate have shown higher degree of mobility (CV% = 82.6) than that in Kurun phosphate (CV% = 64.7), and secondly, the activity ratio of Th-230/Th-232 in Uro phosphate is nine times than that in Kurun phosphate.
Statistical Methods in Integrative Genomics
Richardson, Sylvia; Tseng, George C.; Sun, Wei
2016-01-01
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and f...
Bayesian Methods for Statistical Analysis
Puza, Borek
2015-01-01
Bayesian methods for statistical analysis is a book on statistical methods for analysing a wide variety of data. The book consists of 12 chapters, starting with basic concepts and covering numerous topics, including Bayesian estimation, decision theory, prediction, hypothesis testing, hierarchical models, Markov chain Monte Carlo methods, finite population inference, biased sampling and nonignorable nonresponse. The book contains many exercises, all with worked solutions, including complete c...
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Application of Multivariable Statistical Techniques in Plant-wide WWTP Control Strategies Analysis
DEFF Research Database (Denmark)
Flores Alsina, Xavier; Comas, J.; Rodríguez-Roda, I.
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant...
Statistical Methods in Psychology Journals.
Willkinson, Leland
1999-01-01
Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)
Statistical Methods in Psychology Journals.
Willkinson, Leland
1999-01-01
Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)
Abdolmaleki, Azizeh; Ghasemi, Jahan B; Shiri, Fereshteh; Pirhadi, Somayeh
2015-01-01
Data manipulation and maximum efficient extraction of useful information need a range of searching, modeling, mathematical, and statistical approaches. Hence, an adequate multivariate characterization is the first necessary step in investigation and the results are interpreted after multivariate analysis. Multivariate data analysis is capable of not only large dataset management but also interpret them surely and rapidly. Application of chemometrics and cheminformatics methods may be useful for design and discovery of new drug compounds. In this review, we present a variety of information sources on chemometrics, which we consider useful in different fields of drug design. This review describes exploratory analysis (PCA), classification and multivariate calibration (PCR, PLS) methods to data analysis. It summarizes the main facts of linear and nonlinear multivariate data analysis in drug discovery and provides an introduction to manipulation of data in this field. It handles the fundamental aspects of basic concepts of multivariate methods, principles of projections (PCA and PLS) and introduces the popular modeling and classification techniques. Enough theory behind these methods, more particularly concerning the chemometrics tools is included for those with little experience in multivariate data analysis techniques such as PCA, PLS, SIMCA, etc. We describe each method by avoiding unnecessary equations, and details of calculation algorithms. It provides a synopsis of the method followed by cases of applications in drug design (i.e., QSAR) and some of the features for each method.
Rapid Statistical Methods: Part 1.
Lyon, A. J.
1980-01-01
Discusses some rapid statistical methods which are intended for use by physics teachers. Part one of this article gives some of the simplest and most commonly useful rapid methods. Part two gives references to the relevant theory together with some alternative and additional methods. (HM)
A comparison of multivariate genome-wide association methods
DEFF Research Database (Denmark)
Galesloot, Tessel E; Van Steen, Kristel; Kiemeney, Lambertus A L M
2014-01-01
Joint association analysis of multiple traits in a genome-wide association study (GWAS), i.e. a multivariate GWAS, offers several advantages over analyzing each trait in a separate GWAS. In this study we directly compared a number of multivariate GWAS methods using simulated data. We focused on six...... methods that are implemented in the software packages PLINK, SNPTEST, MultiPhen, BIMBAM, PCHAT and TATES, and also compared them to standard univariate GWAS, analysis of the first principal component of the traits, and meta-analysis of univariate results. We simulated data (N = 1000) for three...... correlation. We compared the power of the methods using empirically fixed significance thresholds (α = 0.05). Our results showed that the multivariate methods implemented in PLINK, SNPTEST, MultiPhen and BIMBAM performed best for the majority of the tested scenarios, with a notable increase in power...
Statistical methods in language processing.
Abney, Steven
2011-05-01
The term statistical methods here refers to a methodology that has been dominant in computational linguistics since about 1990. It is characterized by the use of stochastic models, substantial data sets, machine learning, and rigorous experimental evaluation. The shift to statistical methods in computational linguistics parallels a movement in artificial intelligence more broadly. Statistical methods have so thoroughly permeated computational linguistics that almost all work in the field draws on them in some way. There has, however, been little penetration of the methods into general linguistics. The methods themselves are largely borrowed from machine learning and information theory. We limit attention to that which has direct applicability to language processing, though the methods are quite general and have many nonlinguistic applications. Not every use of statistics in language processing falls under statistical methods as we use the term. Standard hypothesis testing and experimental design, for example, are not covered in this article. WIREs Cogni Sci 2011 2 315-322 DOI: 10.1002/wcs.111 For further resources related to this article, please visit the WIREs website.
Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data
Directory of Open Access Journals (Sweden)
Lijun Wang
2013-01-01
Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.
Statistical methods for physical science
Stanford, John L
1994-01-01
This volume of Methods of Experimental Physics provides an extensive introduction to probability and statistics in many areas of the physical sciences, with an emphasis on the emerging area of spatial statistics. The scope of topics covered is wide-ranging-the text discusses a variety of the most commonly used classical methods and addresses newer methods that are applicable or potentially important. The chapter authors motivate readers with their insightful discussions, augmenting their material withKey Features* Examines basic probability, including coverage of standard distributions, time s
Hernanz, Dolores; Recamales, Angeles F; Meléndez-Martínez, Antonio J; González-Miret, M Lourdes; Heredia, Francisco J
2008-04-23
Apart from the need to assess the color of foods due to its preponderant role in their acceptability, there is currently a new trend consisting in the study of the relationships between the color and the pigments accounting for it. The color of five strawberry varieties cultivated in two different soilless systems has been studied, and an array of multivariate statistical methods have been performed to single out the color parameters that best discriminate among the different samples surveyed and to correlate them with the pigment content. It is concluded that there is not a direct relationship between the external and flesh colorations of the berries. Additionally, after discriminant methods were applied, it was noticed that, taking into account the strawberry varieties, >90% of the cases could be correctly classified, a noticeably lower percentage of correct classification (around 60%) being obtained when the type of cultivation system was selected as the criterion for discrimination. The best correlations of pigment-color coordinates were found between pelargonidin-3-rutinoside and the external a* (r= -0.87) followed by pelargonidin-3-glucoside and the internal L* (r= -0.72).
Statistical Methods for Evolutionary Trees
Edwards, A. W. F.
2009-01-01
In 1963 and 1964, L. L. Cavalli-Sforza and A. W. F. Edwards introduced novel methods for computing evolutionary trees from genetical data, initially for human populations from blood-group gene frequencies. The most important development was their introduction of statistical methods of estimation applied to stochastic models of evolution.
Statistical methods for evolutionary trees.
Edwards, A W F
2009-09-01
In 1963 and 1964, L. L. Cavalli-Sforza and A. W. F. Edwards introduced novel methods for computing evolutionary trees from genetical data, initially for human populations from blood-group gene frequencies. The most important development was their introduction of statistical methods of estimation applied to stochastic models of evolution.
Malm, Christer B; Khoo, Nelson S; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas
2016-01-01
The discovery of erythropoietin (EPO) simplified blood doping in sports, but improved detection methods, for EPO has forced cheating athletes to return to blood transfusion. Autologous blood transfusion with cryopreserved red blood cells (RBCs) is the method of choice, because no valid method exists to accurately detect such event. In endurance sports, it can be estimated that elite athletes improve performance by up to 3% with blood doping, regardless of method. Valid detection methods for autologous blood doping is important to maintain credibility of athletic performances. Recreational male (N = 27) and female (N = 11) athletes served as Transfusion (N = 28) and Control (N = 10) subjects in two different transfusion settings. Hematological variables and physical performance were measured before donation of 450 or 900 mL whole blood, and until four weeks after re-infusion of the cryopreserved RBC fraction. Blood was analyzed for transferrin, iron, Hb, EVF, MCV, MCHC, reticulocytes, leucocytes and EPO. Repeated measures multivariate analysis of variance (MANOVA) and pattern recognition using Principal Component Analysis (PCA) and Orthogonal Projections of Latent Structures (OPLS) discriminant analysis (DA) investigated differences between Control and Transfusion groups over time. Significant increase in performance (15 ± 8%) and VO2max (17 ± 10%) (mean ± SD) could be measured 48 h after RBC re-infusion, and remained increased for up to four weeks in some subjects. In total, 533 blood samples were included in the study (Clean = 220, Transfused = 313). In response to blood transfusion, the largest change in hematological variables occurred 48 h after blood donation, when Control and Transfused groups could be separated with OPLS-DA (R2 = 0.76/Q2 = 0.59). RBC re-infusion resulted in the best model (R2 = 0.40/Q2 = 0.10) at the first sampling point (48 h), predicting one false positive and one false negative. Over all, a 25% and 86% false positives ratio was
Beyond Statistical Methods – Compendium of Statistical Methods for Researchers
Directory of Open Access Journals (Sweden)
Ondřej Vozár
2014-12-01
Full Text Available Book Review: HENDL, J. Přehled statistických metod: Analýza a metaanalýza dat (Overview of Statistical Methods: Data Analysis and Metaanalysis. 4th extended edition. Prague: Portál, 2012. ISBN 978-80-262-0200-4.
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Robust statistical methods with R
Jureckova, Jana
2005-01-01
Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...
Statistical methods for bioimpedance analysis
Directory of Open Access Journals (Sweden)
Christian Tronstad
2014-04-01
Full Text Available This paper gives a basic overview of relevant statistical methods for the analysis of bioimpedance measurements, with an aim to answer questions such as: How do I begin with planning an experiment? How many measurements do I need to take? How do I deal with large amounts of frequency sweep data? Which statistical test should I use, and how do I validate my results? Beginning with the hypothesis and the research design, the methodological framework for making inferences based on measurements and statistical analysis is explained. This is followed by a brief discussion on correlated measurements and data reduction before an overview is given of statistical methods for comparison of groups, factor analysis, association, regression and prediction, explained in the context of bioimpedance research. The last chapter is dedicated to the validation of a new method by different measures of performance. A flowchart is presented for selection of statistical method, and a table is given for an overview of the most important terms of performance when evaluating new measurement technology.
Alvarez, Odalys Quevedo; Tagle, Margarita Edelia Villanueva; Pascual, Jorge L Gómez; Marín, Ma Teresa Larrea; Clemente, Ana Catalina Nuñez; Medina, Miriam Odette Cora; Palau, Raiza Rey; Alfonso, Mario Simeón Pomares
2014-10-01
Spatial and temporal variations of sediment quality in Matanzas Bay (Cuba) were studied by determining a total of 12 variables (Zn, Cu, Pb, As, Ni, Co, Al, Fe, Mn, V, CO₃²⁻, and total hydrocarbons (THC). Surface sediments were collected, annually, at eight stations during 2005-2008. Multivariate statistical techniques, such as principal component (PCA), cluster (CA), and lineal discriminant (LDA) analyses were applied for identification of the most significant variables influencing the environmental quality of sediments. Heavy metals (Zn, Cu, Pb, V, and As) and THC were the most significant species contributing to sediment quality variations during the sampling period. Concentrations of V and As were determined in sediments of this ecosystem for the first time. The variation of sediment environmental quality with the sampling period and the differentiation of samples in three groups along the bay were obtained. The usefulness of the multivariate statistical techniques employed for the environmental interpretation of a limited dataset was confirmed.
Stratigraphic Division and Correlation of the Nihewan Beds by Multivariate Statistical Analysis
Institute of Scientific and Technical Information of China (English)
岳军; 蒋明媚
1992-01-01
Described in paper is the principle of optimal partitioning method for stratigraphic division and correlation.The Nihewan Beds are taken for example to show how to apply this approach in stratigraphic division and correlation.The semiquantitative spectral analysis data on aggregate trace elements in 324 samples taken from the nine sections in the Nihewan Basin are treated with multivariate statistical method for stratigraphic division and correlation.First ,the data from all the sections are respectively calculated by the optimal partitioning method to establish the stratigraphic boundaries.The optimal partitioning method has proved itself to be applicable to stratigraphic division and correlation. In our practice the Nihewan Beds are divided into five zones (I-V).Zone I includes subzones Ia and Ib,Zones Ia,Ib,II and III are considered to be corresponding to the Pliocene(N2),the early Early Pleistocene,the late Early Pleistocene,and the Middle Pleistocene,respectively .Zones IV and V are probably Late Pleistocene in age.This indicated that sediments deposited con-temporaneous in the sections of the same basin are similar in geochemical characteristics,although dif-ferent in geographical location.However,the sediments also show some variations ,with a transitional relationship from one section to another .For example ,in Zone II,the sediments of the Xiaodukou section show not only the characteristics of the Nangou-Hongya and Hutouliang sections,but also those of the Xiashagou,Shixiaxi,Shixiadong and Wulitai sections.It can be seen from the above that the zones can be characteristically correlated with one another.In addition the feasibility of the optimal partitioning method is also described in the present paper.
Statistical Methods for Fuzzy Data
Viertl, Reinhard
2011-01-01
Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m
Ruiz Ordóñez, Magda Liliana
2008-01-01
ABSRACTThis thesis focuses on the monitoring, fault detection and diagnosis of Wastewater Treatment Plants (WWTP), which are important fields of research for a wide range of engineering disciplines. The main objective is to evaluate and apply a novel artificial intelligent methodology based on situation assessment for monitoring and diagnosis of Sequencing Batch Reactor (SBR) operation. To this end, Multivariate Statistical Process Control (MSPC) in combination with Case-Based Reasoning (CBR)...
Institute of Scientific and Technical Information of China (English)
Amin Manouchehrian; Mostafa Sharifzadeh; Rasoul Hamidzadeh Moghadam
2012-01-01
Before any rock engineering project,mechanical parameters of rocks such as uniaxial compressive strength and young modulus of intact rock get measured using laboratory or in-situ tests,but in some situations preparing the required specimens is impossible.By this time,several models have been established to evaluate UCS and E from rock substantial properties.Artificial neural networks are powerful tools which are employed to establish predictive models and results have shown the priority of this technique compared to classic statistical techniques.In this paper,ANN and multivariate statistical models considering rock textural characteristics have been established to estimate UCS of rock and to validate the responses of the established models,they were compared with laboratory results.For this purpose a data set for 44 samples of sandstone was prepared and for each sample some textural characteristics such as void,mineral content and grain size as well as UCS were determined.To select the best predictors as inputs of the UCS models,this data set was subjected to statistical analyses comprising basic descriptive statistics,bivariate correlation,curve fitting and principal component analyses.Results of such analyses have shown that void,ferroan calcitic cement,argillaceous cement and mica percentage have the most effect on USC Two predictive models for UCS were developed using these variables by ANN and linear multivariate regression.Results have shown that by using simple textural characteristics such as mineral content,cement type and void,strength of studied sandstone can be estimated with acceptable accuracy.ANN and multivariate statistical UCS models,revealed responses with 0.87 and 0.76 regressions,respectively which proves higher potential of ANN model for predicting UCS compared to classic statistical models.
Directory of Open Access Journals (Sweden)
Qing Gu
2016-03-01
Full Text Available Qiandao Lake (Xin’an Jiang reservoir plays a significant role in drinking water supply for eastern China, and it is an attractive tourist destination. Three multivariate statistical methods were comprehensively applied to assess the spatial and temporal variations in water quality as well as potential pollution sources in Qiandao Lake. Data sets of nine parameters from 12 monitoring sites during 2010–2013 were obtained for analysis. Cluster analysis (CA was applied to classify the 12 sampling sites into three groups (Groups A, B and C and the 12 monitoring months into two clusters (April-July, and the remaining months. Discriminant analysis (DA identified Secchi disc depth, dissolved oxygen, permanganate index and total phosphorus as the significant variables for distinguishing variations of different years, with 79.9% correct assignments. Dissolved oxygen, pH and chlorophyll-a were determined to discriminate between the two sampling periods classified by CA, with 87.8% correct assignments. For spatial variation, DA identified Secchi disc depth and ammonia nitrogen as the significant discriminating parameters, with 81.6% correct assignments. Principal component analysis (PCA identified organic pollution, nutrient pollution, domestic sewage, and agricultural and surface runoff as the primary pollution sources, explaining 84.58%, 81.61% and 78.68% of the total variance in Groups A, B and C, respectively. These results demonstrate the effectiveness of integrated use of CA, DA and PCA for reservoir water quality evaluation and could assist managers in improving water resources management.
Kaown, D.; Hyun, Y.; Lee, K.
2004-12-01
The characterization of groundwater contamination at a hydrologically complex agricultural site in Youpori, Chooncheon (Korea) was undertaken by analyzing hydro-chemical data of groundwater within a statistical framework. The data show that high and correlated concentrations of Ca, Mg, and NO3 reflected the polluted nature of groundwater at the site. More than 39% of samples showed nitrate concentrations above the human affected value (3mg/L as NO3-N ), while about 25% samples exceeded the maximum acceptable level (10mg/L as NO3-N ) according to the EPA regulation. Multivariate analyses (factor and cluster analyses) were used to identify contaminant pathway, source and geochemical process. The geostatistical method was applied in order to delineate the spatial extent and variation of nitrate contamination. Factor and cluster analyses indicate that hydrochemical data can clearly characterize the non-point contamination over the area by agrochemical fertilizer as well as point-source pollution like manure spreading near barn or pigpen on groundwater. Nitrate-N, the critical species in the study area, was used to delineate the spatial spread of the contaminants using kriging in the study area.
Blake, Sarah; Henry, Tiernan; Murray, John; Flood, Rory; Muller, Mark R.; Jones, Alan G.; Rath, Volker
2016-04-01
The geothermal energy of thermal groundwater is currently being exploited for district-scale heating in many locations world-wide. The chemical compositions of these thermal waters reflect the provenance and hydrothermal circulation patterns of the groundwater, which are controlled by recharge, rock type and geological structure. Exploring the provenance of these waters using multivariate statistical analysis (MSA) techniques increases our understanding of the hydrothermal circulation systems, and provides a reliable tool for assessing these resources. Hydrochemical data from thermal springs situated in the Carboniferous Dublin Basin in east-central Ireland were explored using MSA, including hierarchical cluster analysis (HCA) and principal component analysis (PCA), to investigate the source aquifers of the thermal groundwaters. To take into account the compositional nature of the hydrochemical data, compositional data analysis (CoDa) techniques were used to process the data prior to the MSA. The results of the MSA were examined alongside detailed time-lapse temperature measurements from several of the springs, and indicate the influence of three important hydrogeological processes on the hydrochemistry of the thermal waters: 1) increased salinity due to evaporite dissolution and increased water-rock-interaction; 2) dissolution of carbonates; and 3) dissolution of metal sulfides and oxides associated with mineral deposits. The use of MSA within the CoDa framework identified subtle temporal variations in the hydrochemistry of the thermal springs, which could not be identified with more traditional graphing methods (e.g., Piper diagrams), or with a standard statistical approach. The MSA was successful in distinguishing different geological settings and different annual behaviours within the group of springs. This study demonstrates the usefulness of the application of MSA within the CoDa framework in order to better understand the underlying controlling processes
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Statistical methods in spatial genetics
DEFF Research Database (Denmark)
Guillot, Gilles; Leblois, Raphael; Coulon, Aurelie
2009-01-01
The joint analysis of spatial and genetic data is rapidly becoming the norm in population genetics. More and more studies explicitly describe and quantify the spatial organization of genetic variation and try to relate it to underlying ecological processes. As it has become increasingly difficult...... to keep abreast with the latest methodological developments, we review the statistical toolbox available to analyse population genetic data in a spatially explicit framework. We mostly focus on statistical concepts but also discuss practical aspects of the analytical methods, highlighting not only...
Institute of Scientific and Technical Information of China (English)
WANG Pei; ZHANG Dinghua; LI Shan; CHEN Bing
2012-01-01
For aircraft manufacturing industries,the analyses and prediction of part machining error during machining process are very important to control and improve part machining quality.In order to effectively control machining error,the method of integrating multivariate statistical process control (MSPC) and stream of variations (SoV) is proposed.Firstly,machining error is modeled by multi-operation approaches for part machining process.SoV is adopted to establish the mathematic model of the relationship between the error of upstream operations and the error of downstream operations.Here error sources not only include the influence of upstream operations but also include many of other error sources.The standard model and the predicted model about SoV are built respectively by whether the operation is done or not to satisfy different requests during part machining process.Secondly,the method of one-step ahead forecast error (OSFE) is used to eliminate autocorrelativity of the sample data from the SoV model,and the T2 control chart in MSPC is built to realize machining error detection according to the data characteristics of the above error model,which can judge whether the operation is out of control or not.If it is,then feedback is sent to the operations.The error model is modified by adjusting the operation out of control,and continually it is used to monitor operations.Finally,a machining instance containing two operations demonstrates the effectiveness of the machining error control method presented in this paper.
Homogeneity and change-point detection tests for multivariate data using rank statistics
Lung-Yut-Fong, Alexandre; Cappé, Olivier
2011-01-01
Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point analysis. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic $p$-values for the test are then provided in the case where a single potential change-point is to be detected. Compared to available alternatives, the proposed approach appears to be very reliable and robust. This is particularly true in ...
Badran, M; Morsy, R; Soliman, H; Elnimr, T
2016-01-01
The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.
Multivariate Statistical Analysis of Cigarette Design Feature Influence on ISO TNCO Yields.
Agnew-Heard, Kimberly A; Lancaster, Vicki A; Bravo, Roberto; Watson, Clifford; Walters, Matthew J; Holman, Matthew R
2016-06-20
The aim of this study is to explore how differences in cigarette physical design parameters influence tar, nicotine, and carbon monoxide (TNCO) yields in mainstream smoke (MSS) using the International Organization of Standardization (ISO) smoking regimen. Standardized smoking methods were used to evaluate 50 U.S. domestic brand cigarettes and a reference cigarette representing a range of TNCO yields in MSS collected from linear smoking machines using a nonintense smoking regimen. Multivariate statistical methods were used to form clusters of cigarettes based on their ISO TNCO yields and then to explore the relationship between the ISO generated TNCO yields and the nine cigarette physical design parameters between and within each cluster simultaneously. The ISO generated TNCO yields in MSS are 1.1-17.0 mg tar/cigarette, 0.1-2.2 mg nicotine/cigarette, and 1.6-17.3 mg CO/cigarette. Cluster analysis divided the 51 cigarettes into five discrete clusters based on their ISO TNCO yields. No one physical parameter dominated across all clusters. Predicting ISO machine generated TNCO yields based on these nine physical design parameters is complex due to the correlation among and between the nine physical design parameters and TNCO yields. From these analyses, it is estimated that approximately 20% of the variability in the ISO generated TNCO yields comes from other parameters (e.g., filter material, filter type, inclusion of expanded or reconstituted tobacco, and tobacco blend composition, along with differences in tobacco leaf origin and stalk positions and added ingredients). A future article will examine the influence of these physical design parameters on TNCO yields under a Canadian Intense (CI) smoking regimen. Together, these papers will provide a more robust picture of the design features that contribute to TNCO exposure across the range of real world smoking patterns.
Identifying the controls of wildfire activity in Namibia using multivariate statistics
Mayr, Manuel; Le Roux, Johan; Samimi, Cyrus
2015-04-01
data mining techniques to select a conceivable set of variables by their explanatory value and to remove redundancy. We will then apply two multivariate statistical methods suitable to a large variety of data types and frequently used for (non-linear) causative factor identification: Non-metric Multidimensional Scaling (NMDS) and Regression Trees. The assumed value of these analyses is i) to determine the most important predictor variables of fire activity in Namibia, ii) to decipher their complex interactions in driving fire variability in Namibia, and iii) to compare the performance of two state-of-the-art statistical methods. References: Le Roux, J. (2011): The effect of land use practices on the spatial and temporal characteristics of savanna fires in Namibia. Doctoral thesis at the University of Erlangen-Nuremberg/Germany - 155 pages.
An Improvement of the Hotelling T2 Statistic in Monitoring Multivariate Quality Characteristics
Directory of Open Access Journals (Sweden)
Ashkan Shabbak
2012-01-01
Full Text Available The Hotelling T2 statistic is the most popular statistic used in multivariate control charts to monitor multiple qualities. However, this statistic is easily affected by the existence of more than one outlier in the data set. To rectify this problem, robust control charts, which are based on the minimum volume ellipsoid and the minimum covariance determinant, have been proposed. Most researchers assess the performance of multivariate control charts based on the number of signals without paying much attention to whether those signals are really outliers. With due respect, we propose to evaluate control charts not only based on the number of detected outliers but also with respect to their correct positions. In this paper, an Upper Control Limit based on the median and the median absolute deviation is also proposed. The results of this study signify that the proposed Upper Control Limit improves the detection of correct outliers but that it suffers from a swamping effect when the positions of outliers are not taken into consideration. Finally, a robust control chart based on the diagnostic robust generalised potential procedure is introduced to remedy this drawback.
Djorgovski, S. G.
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has
Tay, C. K.; Hayford, E. K.; Hodgson, I. O. A.
2017-02-01
Multivariate statistical technique and hydrogeochemical approach were employed for groundwater assessment within the Lower Pra Basin. The main objective was to delineate the main processes that are responsible for the water chemistry and pollution of groundwater within the basin. Fifty-four (54) (No) boreholes were sampled in January 2012 for quality assessment. PCA using Varimax with Kaiser Normalization method of extraction for both rotated space and component matrix have been applied to the data. Results show that Spearman's correlation matrix of major ions revealed expected process-based relationships derived mainly from the geochemical processes, such as ion-exchange and silicate/aluminosilicate weathering within the aquifer. Three main principal components influence the water chemistry and pollution of groundwater within the basin. The three principal components have accounted for approximately 79% of the total variance in the hydrochemical data. Component 1 delineates the main natural processes (water-soil-rock interactions) through which groundwater within the basin acquires its chemical characteristics, Component 2 delineates the incongruent dissolution of silicate/aluminosilicates, while Component 3 delineates the prevalence of pollution principally from agricultural input as well as trace metal mobilization in groundwater within the basin. The loadings and score plots of the first two PCs show grouping pattern which indicates the strength of the mutual relation among the hydrochemical variables. In terms of proper management and development of groundwater within the basin, communities, where intense agriculture is taking place, should be monitored and protected from agricultural activities. especially where inorganic fertilizers are used by creating buffer zones. Monitoring of the water quality especially the water pH is recommended to ensure the acid neutralizing potential of groundwater within the basin thereby, curtailing further trace metal
Tay, C. K.; Hayford, E. K.; Hodgson, I. O. A.
2017-06-01
Multivariate statistical technique and hydrogeochemical approach were employed for groundwater assessment within the Lower Pra Basin. The main objective was to delineate the main processes that are responsible for the water chemistry and pollution of groundwater within the basin. Fifty-four (54) (No) boreholes were sampled in January 2012 for quality assessment. PCA using Varimax with Kaiser Normalization method of extraction for both rotated space and component matrix have been applied to the data. Results show that Spearman's correlation matrix of major ions revealed expected process-based relationships derived mainly from the geochemical processes, such as ion-exchange and silicate/aluminosilicate weathering within the aquifer. Three main principal components influence the water chemistry and pollution of groundwater within the basin. The three principal components have accounted for approximately 79% of the total variance in the hydrochemical data. Component 1 delineates the main natural processes (water-soil-rock interactions) through which groundwater within the basin acquires its chemical characteristics, Component 2 delineates the incongruent dissolution of silicate/aluminosilicates, while Component 3 delineates the prevalence of pollution principally from agricultural input as well as trace metal mobilization in groundwater within the basin. The loadings and score plots of the first two PCs show grouping pattern which indicates the strength of the mutual relation among the hydrochemical variables. In terms of proper management and development of groundwater within the basin, communities, where intense agriculture is taking place, should be monitored and protected from agricultural activities. especially where inorganic fertilizers are used by creating buffer zones. Monitoring of the water quality especially the water pH is recommended to ensure the acid neutralizing potential of groundwater within the basin thereby, curtailing further trace metal
Djorgovski, S. George
1994-01-01
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.
A brief introduction to multivariate methods in grape and wine analysis
Directory of Open Access Journals (Sweden)
D Cozzolino
2009-03-01
Full Text Available D Cozzolino1, W U Cynkar1, N Shah1, R G Dambergs2, P A Smith11The Australian Wine Research Institute, Urrbrae, Glen Osmond, SA, Australia; 2The Australian Wine Research Institute, Tasmanian Institute of Agricultural Research, University of Tasmania, Hobart, Tasmania, AustraliaAbstract: Real-world systems are usually multivariate and hence usually cannot be adequately described by one selected variable without the risk of serious misrepresentation. Analyzing the effect of one variable at a time by analysis of variance techniques can give useful descriptive information, but this will not give specific information about relationships among variables and other important relationships in the entire matrix. Multivariate data analysis was developed in the late 1960s, and used by a number of research groups in analytical and physical organic chemistry due to the introduction of instrumentation giving multivariate responses for each sample analyzed. Development of such methods was also made possible by the availability of computers. Multivariate data analysis involves the use of mathematical and statistical techniques to extract information from complex data sets. The objective of this paper is to briefly describe and illustrate some multivariate data analysis methods used for grape and wine analysis.Keywords: multivariate analysis, data mining, wine, grape
Intelligent numerical methods II applications to multivariate fractional calculus
Anastassiou, George A
2016-01-01
In this short monograph Newton-like and other similar numerical methods with applications to solving multivariate equations are developed, which involve Caputo type fractional mixed partial derivatives and multivariate fractional Riemann-Liouville integral operators. These are studied for the first time in the literature. The chapters are self-contained and can be read independently. An extensive list of references is given per chapter. The book’s results are expected to find applications in many areas of applied mathematics, stochastics, computer science and engineering. As such this short monograph is suitable for researchers, graduate students, to be used in graduate classes and seminars of the above subjects, also to be in all science and engineering libraries.
A Symplectic Method to Generate Multivariate Normal Distributions
Baumgarten, Christian
2012-01-01
The AMAS group at the Paul Scherrer Institute developed an object oriented library for high performance simulation of high intensity ion beam transport with space charge. Such particle-in-cell (PIC) simulations require a method to generate multivariate particle distributions as starting conditions. In a preceeding publications it has been shown that the generators of symplectic transformations in two dimensions are a subset of the real Dirac matrices (RDMs) and that few symplectic transformations are required to transform a quadratic Hamiltonian into diagonal form. Here we argue that the use of RDMs is well suited for the generation of multivariate normal distributions with arbitrary covariances. A direct and simple argument supporting this claim is that this is the "natural" way how such distributions are formed. The transport of charged particle beams may serve as an example: An uncorrelated gaussian distribution of particles starting at some initial position of the accelerator is subject to linear deformat...
Order statistics & inference estimation methods
Balakrishnan, N
1991-01-01
The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co
Bevacqua, Emanuele; Maraun, Douglas; Hobæk Haff, Ingrid; Widmann, Martin; Vrac, Mathieu
2017-04-01
Compound events are multivariate extreme events in which the individual contributing variables may not be extreme themselves, but their joint - dependent - occurrence causes an extreme impact. The conventional univariate statistical analysis cannot give accurate information regarding the multivariate nature of these events. We develop a conceptual model, implemented via pair-copula constructions, which allows for the quantification of the risk associated with compound events in present day and future climate, as well as the uncertainty estimates around such risk. The model includes meteorological predictors which provide insight into both the involved physical processes, and the temporal variability of CEs. Moreover, this model provides multivariate statistical downscaling of compound events. Downscaling of compound events is required to extend their risk assessment to the past or future climate, where climate models either do not simulate realistic values of the local variables driving the events, or do not simulate them at all. Based on the developed model, we study compound floods, i.e. joint storm surge and high river runoff, in Ravenna (Italy). To explicitly quantify the risk, we define the impact of compound floods as a function of sea and river levels. We use meteorological predictors to extend the analysis to the past, and get a more robust risk analysis. We quantify the uncertainties of the risk analysis observing that they are very large due to the shortness of the available data, though this may also be the case in other studies where they have not been estimated. Ignoring the dependence between sea and river levels would result in an underestimation of risk, in particular the expected return period of the highest compound flood observed increases from about 20 to 32 years when switching from the dependent to the independent case.
Singh, Elangbam J K; Gupta, Abhik; Singh, N R
2013-04-01
The aim of this paper was to analyze the groundwater quality of Imphal West district, Manipur, India, and assess its suitability for drinking, domestic, and agricultural use. Eighteen physico-chemical variables were analyzed in groundwater from 30 different hand-operated tube wells in urban, suburban, and rural areas in two seasons. The data were subjected to uni-, bi-, and multivariate statistical analysis, the latter comprising cluster analysis (CA), principal component analysis (PCA), and factor analysis (FA). Arsenic concentrations exceed the Indian standard in 23.3% and the WHO limit in 73.3% of the groundwater sources with only 26.7% in the acceptable range. Several variables like iron, chloride, sodium, sulfate, total dissolved solids, and turbidity are also beyond their desirable limits for drinking water in a number of sites. Sodium concentrations and sodium absorption ratio (SAR) are both high to render the water from the majority of the sources unsuitable for agricultural use. Multivariate statistical techniques, especially varimax rotation of PCA data helped to bring to focus the hidden yet important variables and understand their roles in influencing groundwater quality. Widespread arsenic contamination and high sodium concentration of groundwater pose formidable constraints towards its exploitation for drinking and other domestic and agricultural use in the study area, although urban anthropogenic impacts are not yet pronounced.
Multivariate statistical analysis of surface water chemistry: A case study of Gharasoo River, Iran
Directory of Open Access Journals (Sweden)
MH Sayadi
2014-09-01
Full Text Available Regional water quality is a hot spot in the environmental sciences for inconsistency of pollutants. In this paper, the surface water quality of the Gharasoo River in western Iran is assessed incorporating multivariate statistical techniques. Parameters like EC, TDS, pH, HCO3-, Cl-, SO4 2-, Ca2+, Mg2+ and Na+ were analyzed. Principal component and factor analysis is showed the parameters generated 3 significant factors, which explained 73.06% of the variance in data sets. Factor 1 may be derived from agricultural activities and subsequent release of EC, TDS, SO4 2- and Na+ to the water. Factor 2 could be influenced by domestic pollution and explained the deliverance of HCO3-, Cl- and Mg2+ into the water. Factor 3 contains hydro-geochemical variable Ca2+ and pH, originating from mineralization of the geological components of bed sediments and soils of watershed area. Likewise, the clustering analysis generated 3 groups of the stations as the groups had similar characteristic features. Pearson correlation analysis showed significant correlations between HCO3- and Mg2+ (0.775, Ca2+ (0.552 as well as TDS and Na+ (0.726. With reference to multivariate statistical analyses it can be concluded that the agricultural, domestic and hydro-geochemical sources are releasing the pollutants into the Gharasoo River water.
Ajorlo, Majid; Abdullah, Ramdzani B; Yusoff, Mohd Kamil; Halim, Ridzwan Abd; Hanif, Ahmad Husni Mohd; Willms, Walter D; Ebrahimian, Mahboubeh
2013-10-01
This study investigates the applicability of multivariate statistical techniques including cluster analysis (CA), discriminant analysis (DA), and factor analysis (FA) for the assessment of seasonal variations in the surface water quality of tropical pastures. The study was carried out in the TPU catchment, Kuala Lumpur, Malaysia. The dataset consisted of 1-year monitoring of 14 parameters at six sampling sites. The CA yielded two groups of similarity between the sampling sites, i.e., less polluted (LP) and moderately polluted (MP) at temporal scale. Fecal coliform (FC), NO3, DO, and pH were significantly related to the stream grouping in the dry season, whereas NH3, BOD, Escherichia coli, and FC were significantly related to the stream grouping in the rainy season. The best predictors for distinguishing clusters in temporal scale were FC, NH3, and E. coli, respectively. FC, E. coli, and BOD with strong positive loadings were introduced as the first varifactors in the dry season which indicates the biological source of variability. EC with a strong positive loading and DO with a strong negative loading were introduced as the first varifactors in the rainy season, which represents the physiochemical source of variability. Multivariate statistical techniques were effective analytical techniques for classification and processing of large datasets of water quality and the identification of major sources of water pollution in tropical pastures.
Bayes linear statistics, theory & methods
Goldstein, Michael
2007-01-01
Bayesian methods combine information available from data with any prior information available from expert knowledge. The Bayes linear approach follows this path, offering a quantitative structure for expressing beliefs, and systematic methods for adjusting these beliefs, given observational data. The methodology differs from the full Bayesian methodology in that it establishes simpler approaches to belief specification and analysis based around expectation judgements. Bayes Linear Statistics presents an authoritative account of this approach, explaining the foundations, theory, methodology, and practicalities of this important field. The text provides a thorough coverage of Bayes linear analysis, from the development of the basic language to the collection of algebraic results needed for efficient implementation, with detailed practical examples. The book covers:The importance of partial prior specifications for complex problems where it is difficult to supply a meaningful full prior probability specification...
THE GROWTH POINTS OF STATISTICAL METHODS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2014-11-01
Full Text Available On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data
An Effective Method to Identify Heritable Components from Multivariate Phenotypes.
Directory of Open Access Journals (Sweden)
Jiangwen Sun
Full Text Available Multivariate phenotypes may be characterized collectively by a variety of low level traits, such as in the diagnosis of a disease that relies on multiple disease indicators. Such multivariate phenotypes are often used in genetic association studies. If highly heritable components of a multivariate phenotype can be identified, it can maximize the likelihood of finding genetic associations. Existing methods for phenotype refinement perform unsupervised cluster analysis on low-level traits and hence do not assess heritability. Existing heritable component analytics either cannot utilize general pedigrees or have to estimate the entire covariance matrix of low-level traits from limited samples, which leads to inaccurate estimates and is often computationally prohibitive. It is also difficult for these methods to exclude fixed effects from other covariates such as age, sex and race, in order to identify truly heritable components. We propose to search for a combination of low-level traits and directly maximize the heritability of this combined trait. A quadratic optimization problem is thus derived where the objective function is formulated by decomposing the traditional maximum likelihood method for estimating the heritability of a quantitative trait. The proposed approach can generate linearly-combined traits of high heritability that has been corrected for the fixed effects of covariates. The effectiveness of the proposed approach is demonstrated in simulations and by a case study of cocaine dependence. Our approach was computationally efficient and derived traits of higher heritability than those by other methods. Additional association analysis with the derived cocaine-use trait identified genetic markers that were replicated in an independent sample, further confirming the utility and advantage of the proposed approach.
Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida
Sayemuzzaman, M.; Ye, M.
2015-12-01
The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface
Singularly Perturbation Method Applied To Multivariable PID Controller Design
Directory of Open Access Journals (Sweden)
Mashitah Che Razali
2015-01-01
Full Text Available Proportional integral derivative (PID controllers are commonly used in process industries due to their simple structure and high reliability. Efficient tuning is one of the relevant issues of PID controller type. The tuning process always becomes a challenging matter especially for multivariable system and to obtain the best control tuning for different time scales system. This motivates the use of singularly perturbation method into the multivariable PID (MPID controller designs. In this work, wastewater treatment plant and Newell and Lee evaporator were considered as system case studies. Four MPID control strategies, Davison, Penttinen-Koivo, Maciejowski, and Combined methods, were applied into the systems. The singularly perturbation method based on Naidu and Jian Niu algorithms was applied into MPID control design. It was found that the singularly perturbed system obtained by Naidu method was able to maintain the system characteristic and hence was applied into the design of MPID controllers. The closed loop performance and process interactions were analyzed. It is observed that less computation time is required for singularly perturbed MPID controller compared to the conventional MPID controller. The closed loop performance shows good transient responses, low steady state error, and less process interaction when using singularly perturbed MPID controller.
Energy Technology Data Exchange (ETDEWEB)
Mayer, B. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Valdez, C. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); DeHope, A. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Spackman, P. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sanner, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Martinez, H. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Williams, A. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-11-28
Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduled precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Ben Alaya, M. A.; Chebana, F.; Ouarda, T. B. M. J.
2016-09-01
Statistical downscaling techniques are required to refine atmosphere-ocean global climate data and provide reliable meteorological information such as a realistic temporal variability and relationships between sites and variables in a changing climate. To this end, the present paper introduces a modular structure combining two statistical tools of increasing interest during the last years: (1) Gaussian copula and (2) quantile regression. The quantile regression tool is employed to specify the entire conditional distribution of downscaled variables and to address the limitations of traditional regression-based approaches whereas the Gaussian copula is performed to describe and preserve the dependence between both variables and sites. A case study based on precipitation and maximum and minimum temperatures from the province of Quebec, Canada, is used to evaluate the performance of the proposed model. Obtained results suggest that this approach is capable of generating series with realistic correlation structures and temporal variability. Furthermore, the proposed model performed better than a classical multisite multivariate statistical downscaling model for most evaluation criteria.
Directory of Open Access Journals (Sweden)
M.A. Delavar
2016-02-01
Full Text Available Introduction: The accumulation of heavy metals (HMs in the soil is of increasing concern due to food safety issues, potential health risks, and the detrimental effects on soil ecosystems. HMs may be considered as the most important soil pollutants, because they are not biodegradable and their physical movement through the soil profile is relatively limited. Therefore, root uptake process may provide a big chance for these pollutants to transfer from the surface soil to natural and cultivated plants, which may eventually steer them to human bodies. The general behavior of HMs in the environment, especially their bioavailability in the soil, is influenced by their origin. Hence, source apportionment of HMs may provide some essential information for better management of polluted soils to restrict the HMs entrance to the human food chain. This paper explores the applicability of multivariate statistical techniques in the identification of probable sources that can control the concentration and distribution of selected HMs in the soils surrounding the Zanjan Zinc Specialized Industrial Town (briefly Zinc Town. Materials and Methods: The area under investigation has a size of approximately 4000 ha.It is located around the Zinc Town, Zanjan province. A regular grid sampling pattern with an interval of 500 meters was applied to identify the sample location, and 184 topsoil samples (0-10 cm were collected. The soil samples were air-dried and sieved through a 2 mm polyethylene sieve and then, were digested using HNO3. The total concentrations of zinc (Zn, lead (Pb, cadmium (Cd, Nickel (Ni and copper (Cu in the soil solutions were determined via Atomic Absorption Spectroscopy (AAS. Data were statistically analyzed using the SPSS software version 17.0 for Windows. Correlation Matrix (CM, Principal Component Analyses (PCA and Factor Analyses (FA techniques were performed in order to identify the probable sources of HMs in the studied soils. Results and
Benson, Nsikak U; Asuquo, Francis E; Williams, Akan B; Essien, Joseph P; Ekong, Cyril I; Akpabio, Otobong; Olajire, Abaas A
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources.
Directory of Open Access Journals (Sweden)
Nsikak U Benson
Full Text Available Trace metals (Cd, Cr, Cu, Ni and Pb concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria. The degree of contamination was assessed using the individual contamination factors (ICF and global contamination factor (GCF. Multivariate statistical approaches including principal component analysis (PCA, cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
Multivariate statistical analysis for the surface water quality of the Luan River, China
Institute of Scientific and Technical Information of China (English)
Zhi-wei ZHAO; Fu-yi CUI
2009-01-01
In order to analyze the characteristics of surface water resource quality for the reconstruction of old water treatment plant, multivariate statistical techniques such as cluster analysis and factor analysis were applied to the data of Yuqiao Reservoir--surface water resource of the Luan River, China. The results of cluster analysis demonstrate that the months of one year were divided into 3 groups and the characteristic of clusters was agreed with the seasonal characteristics in North China. Three factors were derived from the complicated set using factor analysis. Factor 1 included turbidity and chlorophyll, which seemed to be related to the anthropogenic activities; factor 2 included alkaline and hardness, which were related to the natural characteristic of surface water; and factor 3 included Cl and NO-N affected by mineral and agricultural activities. The sinusoidal shape of the score plots of the three factors shows that the temporal variations caused by natural and human factors are linked to seasouality.
Energy Technology Data Exchange (ETDEWEB)
Berman, E F; Kulp, K S; Knize, M G; Wu, L; Nelson, E J; Nelson, D O; Wu, K J
2006-05-04
Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) is utilized to examine the mass spectra and fragmentation patterns of seven isomeric monosaccharides. Multivariate statistical analysis techniques, including principal component analysis (PCA), allow discrimination of the extremely similar mass spectra of stereoisomers. Furthermore, PCA identifies those fragment peaks which vary significantly between spectra. Heavy isotope studies confirm that these peaks are indeed sugar fragments, allow identification of the fragments, and provide clues to the fragmentation pathways. Excellent reproducibility is shown by multiple experiments performed over time and on separate samples. This study demonstrates the combined selectivity and discrimination power of ToF-SIMS and PCA, and suggests new applications of the technique including differentiation of subtle chemical changes in biological samples that may provide insights into cellular processes, disease progress, and disease diagnosis.
Multivariate regional frequency analysis: Two new methods to increase the accuracy of measures
Abdi, Amin; Hassanzadeh, Yousef; Talatahari, Siamak; Fakheri-Fard, Ahmad; Mirabbasi, Rasoul; Ouarda, Taha B. M. J.
2017-09-01
The accurate detection of discordant sites in a heterogeneous region and the estimation of the regional parameters of a statistical distribution are two important issues in multivariate regional frequency analysis. In this study, two new methods are proposed for increasing the accuracy of the multivariate L-moment approach. The first one, the optimization-based method (OBM) is utilized to estimate the best distribution parameters. The second one is the rank-based method (RBM), which is used in the robust discordancy measure for identifying discordant sites. In order to assess the performance of the proposed approaches on the heterogeneity measure, real and simulated regions of drought characteristics are considered. The results confirm the usefulness of the new methods in comparison with some well-established techniques.
Fujiki, Yuya; Kumada, Yoichi; Kishimoto, Michimasa
2015-08-01
The proteomics technique, which consists of two-dimensional gel electrophoresis (2-DE), peptide mass fingerprinting (PMF), gel image analysis, and multivariate statistics, was applied to the phase analysis of a fed-batch culture for the production of a single-chain variable fragment (scFv) of an anti-C-reactive protein (CRP) antibody by Pichia pastoris. The time courses of the fed-batch culture were separated into three distinct phases: the growth phase of the batch process, the growth phase of the fed-batch process, and the production phase of the fed-batch process. Multivariate statistical analysis using 2-DE gel image analysis data clearly showed the change in the culture phase and provided information concerning the protein expression, which suggested a metabolic change related to cell growth and production during the fed-batch culture. Furthermore, specific proteins, such as alcohol oxidase, which is strongly related to scFv expression, and proteinase A, which could biodegrade scFv in the latter phases of production, were identified via the PMF method. The proteomics technique provided valuable information about the effect of the methanol concentration on scFv production.
Dynamic system multivariate calibration by system identification methods
Directory of Open Access Journals (Sweden)
Rolf Ergon
1998-04-01
Full Text Available In the first part of the paper, the optimal estimator for normally nonmeasured primary outputs from a linear and time invariant dynamic system is developed. The estimator is based on an underlying Kalman filter, utilizing all available information in known inputs and measured secondary outputs. Assuming sufficient experimental data, the optimal estimator can be identified by specifying an output error model in a standard prediction error identification method. It is further shown that static estimators found by the ordinary least squares method or multivariate calibration by means of principal component regression (PCR or partial least squares regression (PLSR can be seen as special cases of the optimal dynamic estimator. Finally, it is shown that dynamic system PCR and PLSR solutions can be developed as special cases of the general estimator for dynamic systems.
Prats-Montalbán, José M.; López, Fernando; Valiente, José M.; Ferrer, Alberto
2007-01-01
In this paper we present an innovative way to simultaneously perform feature extraction and classification for the quality control issue of surface grading by applying two well known multivariate statistical projection tools (SIMCA and PLS-DA). These tools have been applied to compress the color texture data describing the visual appearance of surfaces (soft color texture descriptors) and to directly perform classification using statistics and predictions computed from the extracted projection models. Experiments have been carried out using an extensive image database of ceramic tiles (VxC TSG). This image database is comprised of 14 different models, 42 surface classes and 960 pieces. A factorial experimental design has been carried out to evaluate all the combinations of several factors affecting the accuracy rate. Factors include tile model, color representation scheme (CIE Lab, CIE Luv and RGB) and compression/classification approach (SIMCA and PLS-DA). In addition, a logistic regression model is fitted from the experiments to compute accuracy estimates and study the factors effect. The results show that PLS-DA performs better than SIMCA, achieving a mean accuracy rate of 98.95%. These results outperform those obtained in a previous work where the soft color texture descriptors in combination with the CIE Lab color space and the k-NN classi.er achieved a 97.36% of accuracy.
Energy Technology Data Exchange (ETDEWEB)
Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V. [Oak Ridge National Laboratory, Institute for Functional Imaging of Materials, Center for Nanophase Material Science, Oak Ridge, Tennessee 37922 (United States); Sales, Brian C.; Sefat, Athena S. [Oak Ridge National Laboratory, Materials Science and Technology Division, Oak Ridge, Tennessee 37922 (United States)
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
Directory of Open Access Journals (Sweden)
Md. Bodrud-Doza
2016-04-01
Full Text Available This study investigates the groundwater quality in the Faridpur district of central Bangladesh based on preselected 60 sample points. Water evaluation indices and a number of statistical approaches such as multivariate statistics and geostatistics are applied to characterize water quality, which is a major factor for controlling the groundwater quality in term of drinking purposes. The study reveal that EC, TDS, Ca2+, total As and Fe values of groundwater samples exceeded Bangladesh and international standards. Ground water quality index (GWQI exhibited that about 47% of the samples were belonging to good quality water for drinking purposes. The heavy metal pollution index (HPI, degree of contamination (Cd, heavy metal evaluation index (HEI reveal that most of the samples belong to low level of pollution. However, Cd provide better alternative than other indices. Principle component analysis (PCA suggests that groundwater quality is mainly related to geogenic (rock–water interaction and anthropogenic source (agrogenic and domestic sewage in the study area. Subsequently, the findings of cluster analysis (CA and correlation matrix (CM are also consistent with the PCA results. The spatial distributions of groundwater quality parameters are determined by geostatistical modeling. The exponential semivariagram model is validated as the best fitted models for most of the indices values. It is expected that outcomes of the study will provide insights for decision makers taking proper measures for groundwater quality management in central Bangladesh.
González Gallero, Francisco Javier; Galán Vallejo, Manuel; Umbría, Arturo; Gervilla Baena, Juan
2006-08-01
A complete statistical analysis of meteorological and air pollution data was carried out in the 'Campo de Gibraltar' region (in the South of Spain) from 1999 to 2002. This is a heavy industrialized area where, up to date, very few air pollution studies have been made. The main objectives of the study presented here have been the characterization of the meteorological and (gaseous and particulate) air pollution conditions in the area, and the relations between them. Multivariate statistical techniques, such as Principal Component Analysis (PCA), have been applied to the data. The results show that air quality in the area is highly dependent on meteorological conditions such as wind persistence and direction, dispersion capability of the atmosphere, and humidity content. On average, sulphur dioxide and nitrogen oxide air pollution, mainly caused by fuel-oil combustion and traffic, respectively, is not very high. However, an important number of exceedences of the limits established by the EU Directive 1999 for PM10 (particulate matter with a diameter less than 10 microm) have been observed in some points of the area. A significant percentage of these exceedences (about 22% on average) are likely caused by African dust intrusions, which usually take place from May to August. From gaseous and particulate air correlations, it seems that anthropogenic activities contribute with about 19% on average.
One approach in using multivariate statistical process control in analyzing cheese quality
Directory of Open Access Journals (Sweden)
Ilija Djekic
2015-05-01
Full Text Available The objective of this paper was to investigate possibility of using multivariate statistical process control in analysing cheese quality parameters. Two cheese types (white brined cheeses and soft cheese from ultra-filtered milk were selected and analysed for several quality parameters such as dry matter, milk fat, protein contents, pH, NaCl, fat in dry matter and moisture in non-fat solids. The obtained results showed significant variations for most of the quality characteristics which were examined among the two types of cheese. The only stable parameter in both types of cheese was moisture in non-fat solids. All of the other cheese quality characteristics were characterized above or below control limits for most of the samples. Such results indicated a high instability and variations within cheese production. Although the use of statistical process control is not mandatory in the dairy industry, it might provide benefits to organizations in improving quality control of dairy products.
Energy Technology Data Exchange (ETDEWEB)
Fouque, A.L.; Ciuciu, Ph.; Risser, L. [NeuroSpin/CEA, F-91191 Gif-sur-Yvette (France); Fouque, A.L.; Ciuciu, Ph.; Risser, L. [IFR 49, Institut d' Imagerie Neurofonctionnelle, Paris (France)
2009-07-01
In this paper, a novel statistical parcellation of intra-subject functional MRI (fMRI) data is proposed. The key idea is to identify functionally homogenous regions of interest from their hemodynamic parameters. To this end, a non-parametric voxel-based estimation of hemodynamic response function is performed as a prerequisite. Then, the extracted hemodynamic features are entered as the input data of a Multivariate Spatial Gaussian Mixture Model (MSGMM) to be fitted. The goal of the spatial aspect is to favor the recovery of connected components in the mixture. Our statistical clustering approach is original in the sense that it extends existing works done on univariate spatially regularized Gaussian mixtures. A specific Gibbs sampler is derived to account for different covariance structures in the feature space. On realistic artificial fMRI datasets, it is shown that our algorithm is helpful for identifying a parsimonious functional parcellation required in the context of joint detection estimation of brain activity. This allows us to overcome the classical assumption of spatial stationarity of the BOLD signal model. (authors)
Statistical methods in radiation physics
Turner, James E; Bogard, James S
2012-01-01
This statistics textbook, with particular emphasis on radiation protection and dosimetry, deals with statistical solutions to problems inherent in health physics measurements and decision making. The authors begin with a description of our current understanding of the statistical nature of physical processes at the atomic level, including radioactive decay and interactions of radiation with matter. Examples are taken from problems encountered in health physics, and the material is presented such that health physicists and most other nuclear professionals will more readily understand the application of statistical principles in the familiar context of the examples. Problems are presented at the end of each chapter, with solutions to selected problems provided online. In addition, numerous worked examples are included throughout the text.
Statistical inference via fiducial methods
Salomé, Diemer
1998-01-01
In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary
Abbas Alkarkhi, F M; Ismail, Norli; Easa, Azhar Mat
2008-02-11
Cockles (Anadara granosa) sample obtained from two rivers in the Penang State of Malaysia were analyzed for the content of arsenic (As) and heavy metals (Cr, Cd, Zn, Cu, Pb, and Hg) using a graphite flame atomic absorption spectrometer (GF-AAS) for Cr, Cd, Zn, Cu, Pb, As and cold vapor atomic absorption spectrometer (CV-AAS) for Hg. The two locations of interest with 20 sampling points of each location were Kuala Juru (Juru River) and Bukit Tambun (Jejawi River). Multivariate statistical techniques such as multivariate analysis of variance (MANOVA) and discriminant analysis (DA) were applied for analyzing the data. MANOVA showed a strong significant difference between the two rivers in term of As and heavy metals contents in cockles. DA gave the best result to identify the relative contribution for all parameters in discriminating (distinguishing) the two rivers. It provided an important data reduction as it used only two parameters (Zn and Cd) affording more than 72% correct assignations. Results indicated that the two rivers were different in terms of As and heavy metal contents in cockle, and the major difference was due to the contribution of Zn and Cd. A positive correlation was found between discriminate functions (DF) and Zn, Cd and Cr, whereas negative correlation was exhibited with other heavy metals. Therefore, DA allowed a reduction in the dimensionality of the data set, delineating a few indicator parameters responsible for large variations in heavy metals and arsenic content. Taking into account of these results, it can be suggested that a continuous monitoring of As and heavy metals in cockles be performed in these two rivers.
Wolf, S. F.; Lipschutz, M. E.
1992-07-01
logistic regression statistical techniques as tools for discriminant analysis. A randomization-simulation technique can also be used to make distribution-independent comparisons and to verify that any observed differences are not due to insufficient samples or too many independent variables (Lipschutz and Samuels, 1991). These methods allow us to test for the existence of distinct compositional subpopulations in what is supposedly a single meteorite population. At the time of writing this abstract our database consists of 55 H4-6 chondrites (Lingner et al, 1987 and this work). Nine of these meteorites are members of the proposed "cluster 1" co-orbital meteoroid stream. For these 9 samples, linear discriminant analysis based on the concentrations of 10 labile trace elements reveals a difference between the "cluster 1" subpopulation of H chondrite falls and all other H chondrite falls at the reveals a difference at the Steele, D. (1988) Icarus 75, 64-96. Wetherill, G. W. (1986) Nature 319, 357-358. Wolf, S. F. and Lipschutz, M. E. (1992) Lunar Planet. Sci. (abstract) 23, 1545-1546.
Statistical methods in translational medicine.
Chow, Shein-Chung; Tse, Siu-Keung; Lin, Min
2008-12-01
This study focuses on strategies and statistical considerations for assessment of translation in language (e.g. translation of case report forms in multinational clinical trials), information (e.g. translation of basic discoveries to the clinic) and technology (e.g. translation of Chinese diagnostic techniques to well-established clinical study endpoints) in pharmaceutical/clinical research and development. However, most of our efforts will be directed to statistical considerations for translation in information. Translational medicine has been defined as bench-to-bedside research, where a basic laboratory discovery becomes applicable to the diagnosis, treatment or prevention of a specific disease, and is brought forth by either a physicianscientist who works at the interface between the research laboratory and patient care, or by a team of basic and clinical science investigators. Statistics plays an important role in translational medicine to ensure that the translational process is accurate and reliable with certain statistical assurance. Statistical inference for the applicability of an animal model to a human model is also discussed. Strategies for selection of clinical study endpoints (e.g. absolute changes, relative changes, or responder-defined, based on either absolute or relative change) are reviewed.
Statistical Methods in Translational Medicine
Directory of Open Access Journals (Sweden)
Shein-Chung Chow
2008-12-01
Full Text Available This study focuses on strategies and statistical considerations for assessment of translation in language (e.g. translation of case report forms in multinational clinical trials, information (e.g. translation of basic discoveries to the clinic and technology (e.g. translation of Chinese diagnostic techniques to well-established clinical study endpoints in pharmaceutical/clinical research and development. However, most of our efforts will be directed to statistical considerations for translation in information. Translational medicine has been defined as bench-to-bedside research, where a basic laboratory discovery becomes applicable to the diagnosis, treatment or prevention of a specific disease, and is brought forth by either a physician—scientist who works at the interface between the research laboratory and patient care, or by a team of basic and clinical science investigators. Statistics plays an important role in translational medicine to ensure that the translational process is accurate and reliable with certain statistical assurance. Statistical inference for the applicability of an animal model to a human model is also discussed. Strategies for selection of clinical study endpoints (e.g. absolute changes, relative changes, or responder-defined, based on either absolute or relative change are reviewed.
A review of multivariate methods in brain imaging data fusion
Sui, Jing; Adali, Tülay; Li, Yi-Ou; Yang, Honghui; Calhoun, Vince D.
2010-03-01
On joint analysis of multi-task brain imaging data sets, a variety of multivariate methods have shown their strengths and been applied to achieve different purposes based on their respective assumptions. In this paper, we provide a comprehensive review on optimization assumptions of six data fusion models, including 1) four blind methods: joint independent component analysis (jICA), multimodal canonical correlation analysis (mCCA), CCA on blind source separation (sCCA) and partial least squares (PLS); 2) two semi-blind methods: parallel ICA and coefficient-constrained ICA (CC-ICA). We also propose a novel model for joint blind source separation (BSS) of two datasets using a combination of sCCA and jICA, i.e., 'CCA+ICA', which, compared with other joint BSS methods, can achieve higher decomposition accuracy as well as the correct automatic source link. Applications of the proposed model to real multitask fMRI data are compared to joint ICA and mCCA; CCA+ICA further shows its advantages in capturing both shared and distinct information, differentiating groups, and interpreting duration of illness in schizophrenia patients, hence promising applicability to a wide variety of medical imaging problems.
Register-based statistics statistical methods for administrative data
Wallgren, Anders
2014-01-01
This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi
Almeida, Tiago P.; Chu, Gavin S.; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H.; Stafford, Peter J.; Schlindwein, Fernando S.; Ng, G. André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination (P PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information. PMID:28883795
Use of multivariate statistics to identify unreliable data obtained using CASA.
Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón
2013-06-01
In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Feyissa, Daniel D.; Aher, Yogesh D.; Engidawork, Ephrem; Höger, Harald; Lubec, Gert; Korz, Volker
2017-01-01
Animal models for anxiety, depressive-like and cognitive diseases or aging often involve testing of subjects in behavioral test batteries. The large number of test variables with different mean variations and within and between test correlations often constitute a significant problem in determining essential variables to assess behavioral patterns and their variation in individual animals as well as appropriate statistical treatment. Therefore, we applied a multivariate approach (principal component analysis) to analyse the behavioral data of 162 male adult Sprague-Dawley rats that underwent a behavioral test battery including commonly used tests for spatial learning and memory (holeboard) and different behavioral patterns (open field, elevated plus maze, forced swim test) as well as for motor abilities (Rota rod). The high dimensional behavioral results were reduced to fewer components associated with spatial cognition, general activity, anxiety-, and depression-like behavior and motor ability. The loading scores of individual rats on these different components allow an assessment and the distribution of individual features in a population of animals. The reduced number of components can be used also for statistical calculations like appropriate sample sizes for valid discriminations between experimental groups, which otherwise have to be done on each variable. Because the animals were intact, untreated and experimentally naïve the results reflect trait patterns of behavior and thus individuality. The distribution of animals with high or low levels of anxiety, depressive-like behavior, general activity and cognitive features in a local population provides information of the probability of their appeareance in experimental samples and thus may help to avoid biases. However, such an analysis initially requires a large cohort of animals in order to gain a valid assessment.
Directory of Open Access Journals (Sweden)
Zamani Abbas Ali
2012-12-01
Full Text Available Abstract The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP. Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs. Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein
2012-12-17
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
Multivariate statistical process control in product quality review assessment - A case study.
Kharbach, M; Cherrah, Y; Vander Heyden, Y; Bouklouze, A
2017-08-07
According to the Food and Drug Administration and the European Good Manufacturing Practices (GMP) guidelines, Annual Product Review (APR) is a mandatory requirement in GMP. It consists of evaluating a large collection of qualitative or quantitative data in order to verify the consistency of an existing process. According to the Code of Federal Regulation Part 11 (21 CFR 211.180), all finished products should be reviewed annually for the quality standards to determine the need of any change in specification or manufacturing of drug products. Conventional Statistical Process Control (SPC) evaluates the pharmaceutical production process by examining only the effect of a single factor at the time using a Shewhart's chart. It neglects to take into account the interaction between the variables. In order to overcome this issue, Multivariate Statistical Process Control (MSPC) can be used. Our case study concerns an APR assessment, where 164 historical batches containing six active ingredients, manufactured in Morocco, were collected during one year. Each batch has been checked by assaying the six active ingredients by High Performance Liquid Chromatography according to European Pharmacopoeia monographs. The data matrix was evaluated both by SPC and MSPC. The SPC indicated that all batches are under control, while the MSPC, based on Principal Component Analysis (PCA), for the data being either autoscaled or robust scaled, showed four and seven batches, respectively, out of the Hotelling T(2) 95% ellipse. Also, an improvement of the capability of the process is observed without the most extreme batches. The MSPC can be used for monitoring subtle changes in the manufacturing process during an APR assessment. Copyright © 2017 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.
Martin, David; Boyle, Fergal
2015-09-01
Several clinical studies have identified a strong correlation between neointimal hyperplasia following coronary stent deployment and both stent-induced arterial injury and altered vessel hemodynamics. As such, the sequential structural and fluid dynamics analysis of balloon-expandable stent deployment should provide a comprehensive indication of stent performance. Despite this observation, very few numerical studies of balloon-expandable coronary stents have considered both the mechanical and hemodynamic impact of stent deployment. Furthermore, in the few studies that have considered both phenomena, only a small number of stents have been considered. In this study, a sequential structural and fluid dynamics analysis methodology was employed to compare both the mechanical and hemodynamic impact of six balloon-expandable coronary stents. To investigate the relationship between stent design and performance, several common stent design properties were then identified and the dependence between these properties and both the mechanical and hemodynamic variables of interest was evaluated using statistical measures of correlation. Following the completion of the numerical analyses, stent strut thickness was identified as the only common design property that demonstrated a strong dependence with either the mean equivalent stress predicted in the artery wall or the mean relative residence time predicted on the luminal surface of the artery. These results corroborate the findings of the large-scale ISAR-STEREO clinical studies and highlight the crucial role of strut thickness in coronary stent design. The sequential structural and fluid dynamics analysis methodology and the multivariable statistical treatment of the results described in this study should prove useful in the design of future balloon-expandable coronary stents.
Permutation statistical methods an integrated approach
Berry, Kenneth J; Johnston, Janis E
2016-01-01
This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...
Xu, Peng; Rizzoni, Elizabeth Anne; Sul, Se-Yeong; Stephanopoulos, Gregory
2017-01-20
Metabolic engineering entails target modification of cell metabolism to maximize the production of a specific compound. For empowering combinatorial optimization in strain engineering, tools and algorithms are needed to efficiently sample the multidimensional gene expression space and locate the desirable overproduction phenotype. We addressed this challenge by employing design of experiment (DoE) models to quantitatively correlate gene expression with strain performance. By fractionally sampling the gene expression landscape, we statistically screened the dominant enzyme targets that determine metabolic pathway efficiency. An empirical quadratic regression model was subsequently used to identify the optimal gene expression patterns of the investigated pathway. As a proof of concept, our approach yielded the natural product violacein at 525.4 mg/L in shake flasks, a 3.2-fold increase from the baseline strain. Violacein production was further increased to 1.31 g/L in a controlled benchtop bioreactor. We found that formulating discretized gene expression levels into logarithmic variables (Linlog transformation) was essential for implementing this DoE-based optimization procedure. The reported methodology can aid multivariate combinatorial pathway engineering and may be generalized as a standard procedure for accelerating strain engineering and improving metabolic pathway efficiency.
Ielpo, Pierina; Leardi, Riccardo; Pappagallo, Giuseppe; Uricchio, Vito Felice
2017-06-01
In this paper, the results obtained from multivariate statistical techniques such as PCA (Principal component analysis) and LDA (Linear discriminant analysis) applied to a wide soil data set are presented. The results have been compared with those obtained on a groundwater data set, whose samples were collected together with soil ones, within the project "Improvement of the Regional Agro-meteorological Monitoring Network (2004-2007)". LDA, applied to soil data, has allowed to distinguish the geographical origin of the sample from either one of the two macroaeras: Bari and Foggia provinces vs Brindisi, Lecce e Taranto provinces, with a percentage of correct prediction in cross validation of 87%. In the case of the groundwater data set, the best classification was obtained when the samples were grouped into three macroareas: Foggia province, Bari province and Brindisi, Lecce and Taranto provinces, by reaching a percentage of correct predictions in cross validation of 84%. The obtained information can be very useful in supporting soil and water resource management, such as the reduction of water consumption and the reduction of energy and chemical (nutrients and pesticides) inputs in agriculture.
Multivariate statistical approach for the assessment of groundwater quality in Ujjain City, India.
Vishwakarma, Vikas; Thakur, Lokendra Singh
2012-10-01
Groundwater quality assessment is an essential study which plays important role in the rational development and utilization of groundwater. Groundwater quality greatly influences the health of local people. The variations of water quality are essentially the combination of both anthropogenic and natural contributions. In order to understand the underlying physical and chemical processes this study analyzes 8 chemical and physical-chemical water quality parameters, viz. pH, turbidity, electrical conductivity, total dissolved solids, total alkalinity, total hardness, chloride and fluoride recorded at the 54 sampling stations during summer season of 2011 by using multivariate statistical techniques. Hierarchical clustering analysis (CA) is first applied to distinguish groundwater quality patterns among the stations, followed by the use of principle component analysis (PCA) and factor analysis (FA) to extract and recognize the major underlying factors contributing to the variations among the water quality measures. The first three components were chosen for interpretation of the data, which accounts for 72.502% of the total variance in the data set. The maximum number of variables, i.e. turbidity, EC, TDS and chloride were characterized by first component, while second and third were characterized by total alkalinity, total hardness, fluoride and pH respectively. This shows that hydro chemical constituents of the groundwater are mainly controlled by EC, TDS, and fluoride. The findings of the cluster analysis are presented in the form of dendrogram of the sampling stations (cases) as well as hydro chemical variables, which produced four major groupings, suggest that groundwater monitoring can be consolidated.
A Multivariate Computational Method to Analyze High-Content RNAi Screening Data.
Rameseder, Jonathan; Krismer, Konstantin; Dayma, Yogesh; Ehrenberger, Tobias; Hwang, Mun Kyung; Airoldi, Edoardo M; Floyd, Scott R; Yaffe, Michael B
2015-09-01
High-content screening (HCS) using RNA interference (RNAi) in combination with automated microscopy is a powerful investigative tool to explore complex biological processes. However, despite the plethora of data generated from these screens, little progress has been made in analyzing HC data using multivariate methods that exploit the full richness of multidimensional data. We developed a novel multivariate method for HCS, multivariate robust analysis method (M-RAM), integrating image feature selection with ranking of perturbations for hit identification, and applied this method to an HC RNAi screen to discover novel components of the DNA damage response in an osteosarcoma cell line. M-RAM automatically selects the most informative phenotypic readouts and time points to facilitate the more efficient design of follow-up experiments and enhance biological understanding. Our method outperforms univariate hit identification and identifies relevant genes that these approaches would have missed. We found that statistical cell-to-cell variation in phenotypic responses is an important predictor of hits in RNAi-directed image-based screens. Genes that we identified as modulators of DNA damage signaling in U2OS cells include B-Raf, a cancer driver gene in multiple tumor types, whose role in DNA damage signaling we confirm experimentally, and multiple subunits of protein kinase A. © 2015 Society for Laboratory Automation and Screening.
Climate Prediction through Statistical Methods
Akgun, Bora; Tuter, Levent; Kurnaz, Mehmet Levent
2008-01-01
Climate change is a reality of today. Paleoclimatic proxies and climate predictions based on coupled atmosphere-ocean general circulation models provide us with temperature data. Using Detrended Fluctuation Analysis, we are investigating the statistical connection between the climate types of the present and these local temperatures. We are relating this issue to some well-known historic climate shifts. Our main result is that the temperature fluctuations with or without a temperature scale attached to them, can be used to classify climates in the absence of other indicators such as pan evaporation and precipitation.
Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André
2017-01-01
Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P multivariate statistical models were effective in their discrimination (P multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
An Application of Multivariate Statistical Analysis for Query-Driven Visualization
Energy Technology Data Exchange (ETDEWEB)
Gosink, Luke J.; Garth, Christoph; Anderson, John C.; Bethel, E. Wes; Joy, Kenneth I.
2010-03-01
Abstract?Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.
The research of railway freight statistics system and statistical methods
Directory of Open Access Journals (Sweden)
Wu Hua-Wen
2013-01-01
Full Text Available EXT is a JavaScript framework for developing Web interfaces, this paper describes the Ext framework and its application in railway freight statistical and analyzing system and Statistical methods. the paper also analyzes the design, function, implementation and so on of the system in detail. As information technology and the requirements of railway transportation organization and operation continue to improve, railway freight statistical and analyzing system improves obviously in the index system, decision analysis and other aspects, better meeting the work requirements. It will play a more important role in the railway transport organization, management, passenger and freight marketing.
Jiang, Miaomiao; Jiao, Yujiao; Wang, Yuefei; Xu, Lei; Wang, Meng; Zhao, Buchang; Jia, Lifu; Pan, Hao; Zhu, Yan; Gao, Xiumei
2014-01-01
Botanical primary metabolites extensively exist in herbal medicine injections (HMIs), but often were ignored to control. With the limitation of bias towards hydrophilic substances, the primary metabolites with strong polarity, such as saccharides, amino acids and organic acids, are usually difficult to detect by the routinely applied reversed-phase chromatographic fingerprint technology. In this study, a proton nuclear magnetic resonance (1H NMR) profiling method was developed for efficient identification and quantification of small polar molecules, mostly primary metabolites in HMIs. A commonly used medicine, Danhong injection (DHI), was employed as a model. With the developed method, 23 primary metabolites together with 7 polyphenolic acids were simultaneously identified, of which 13 metabolites with fully separated proton signals were quantified and employed for further multivariate quality control assay. The quantitative 1H NMR method was validated with good linearity, precision, repeatability, stability and accuracy. Based on independence principal component analysis (IPCA), the contents of 13 metabolites were characterized and dimensionally reduced into the first two independence principal components (IPCs). IPC1 and IPC2 were then used to calculate the upper control limits (with 99% confidence ellipsoids) of χ2 and Hotelling T2 control charts. Through the constructed upper control limits, the proposed method was successfully applied to 36 batches of DHI to examine the out-of control sample with the perturbed levels of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde. The integrated strategy has provided a reliable approach to identify and quantify multiple polar metabolites of DHI in one fingerprinting spectrum, and it has also assisted in the establishment of IPCA models for the multivariate statistical evaluation of HMIs.
Directory of Open Access Journals (Sweden)
Miaomiao Jiang
Full Text Available Botanical primary metabolites extensively exist in herbal medicine injections (HMIs, but often were ignored to control. With the limitation of bias towards hydrophilic substances, the primary metabolites with strong polarity, such as saccharides, amino acids and organic acids, are usually difficult to detect by the routinely applied reversed-phase chromatographic fingerprint technology. In this study, a proton nuclear magnetic resonance (1H NMR profiling method was developed for efficient identification and quantification of small polar molecules, mostly primary metabolites in HMIs. A commonly used medicine, Danhong injection (DHI, was employed as a model. With the developed method, 23 primary metabolites together with 7 polyphenolic acids were simultaneously identified, of which 13 metabolites with fully separated proton signals were quantified and employed for further multivariate quality control assay. The quantitative 1H NMR method was validated with good linearity, precision, repeatability, stability and accuracy. Based on independence principal component analysis (IPCA, the contents of 13 metabolites were characterized and dimensionally reduced into the first two independence principal components (IPCs. IPC1 and IPC2 were then used to calculate the upper control limits (with 99% confidence ellipsoids of χ2 and Hotelling T2 control charts. Through the constructed upper control limits, the proposed method was successfully applied to 36 batches of DHI to examine the out-of control sample with the perturbed levels of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde. The integrated strategy has provided a reliable approach to identify and quantify multiple polar metabolites of DHI in one fingerprinting spectrum, and it has also assisted in the establishment of IPCA models for the multivariate statistical evaluation of HMIs.
Statistical Methods for Unusual Count Data
DEFF Research Database (Denmark)
Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads
2016-01-01
microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...
Kamal, Ghulam Mustafa; Wang, Xiaohua; Bin Yuan; Wang, Jie; Sun, Peng; Zhang, Xu; Liu, Maili
2016-09-01
Soy sauce a well known seasoning all over the world, especially in Asia, is available in global market in a wide range of types based on its purpose and the processing methods. Its composition varies with respect to the fermentation processes and addition of additives, preservatives and flavor enhancers. A comprehensive (1)H NMR based study regarding the metabonomic variations of soy sauce to differentiate among different types of soy sauce available on the global market has been limited due to the complexity of the mixture. In present study, (13)C NMR spectroscopy coupled with multivariate statistical data analysis like principle component analysis (PCA), and orthogonal partial least square-discriminant analysis (OPLS-DA) was applied to investigate metabonomic variations among different types of soy sauce, namely super light, super dark, red cooking and mushroom soy sauce. The main additives in soy sauce like glutamate, sucrose and glucose were easily distinguished and quantified using (13)C NMR spectroscopy which were otherwise difficult to be assigned and quantified due to serious signal overlaps in (1)H NMR spectra. The significantly higher concentration of sucrose in dark, red cooking and mushroom flavored soy sauce can directly be linked to the addition of caramel in soy sauce. Similarly, significantly higher level of glutamate in super light as compared to super dark and mushroom flavored soy sauce may come from the addition of monosodium glutamate. The study highlights the potentiality of (13)C NMR based metabonomics coupled with multivariate statistical data analysis in differentiating between the types of soy sauce on the basis of level of additives, raw materials and fermentation procedures.
Statistical methods in physical mapping
Energy Technology Data Exchange (ETDEWEB)
Nelson, David O. [Univ. of California, Berkeley, CA (United States)
1995-05-01
One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work.
Statistical methods for longitudinal data with agricultural applications
DEFF Research Database (Denmark)
Anantharama Ankinakatte, Smitha
The PhD study focuses on modeling two kings of longitudinal data arising in agricultural applications: continuous time series data and discrete longitudinal data. Firstly, two statistical methods, neural networks and generalized additive models, are applied to predict masistis using multivariate...... algorithm. This was found to compare favourably with the algorithm implemented in the well-known Beagle software. Finally, an R package to apply APFA models developed as part of the PhD project is described...
Directory of Open Access Journals (Sweden)
Vujović Svetlana R.
2013-01-01
Full Text Available This paper illustrates the utility of multivariate statistical techniques for analysis and interpretation of water quality data sets and identification of pollution sources/factors with a view to get better information about the water quality and design of monitoring network for effective management of water resources. Multivariate statistical techniques, such as factor analysis (FA/principal component analysis (PCA and cluster analysis (CA, were applied for the evaluation of variations and for the interpretation of a water quality data set of the natural water bodies obtained during 2010 year of monitoring of 13 parameters at 33 different sites. FA/PCA attempts to explain the correlations between the observations in terms of the underlying factors, which are not directly observable. Factor analysis is applied to physico-chemical parameters of natural water bodies with the aim classification and data summation as well as segmentation of heterogeneous data sets into smaller homogeneous subsets. Factor loadings were categorized as strong and moderate corresponding to the absolute loading values of >0.75, 0.75-0.50, respectively. Four principal factors were obtained with Eigenvalues >1 summing more than 78 % of the total variance in the water data sets, which is adequate to give good prior information regarding data structure. Each factor that is significantly related to specific variables represents a different dimension of water quality. The first factor F1 accounting for 28 % of the total variance and represents the hydrochemical dimension of water quality. The second factor F2 accounting for 18% of the total variance and may be taken factor of water eutrophication. The third factor F3 accounting 17 % of the total variance and represents the influence of point sources of pollution on water quality. The fourth factor F4 accounting 13 % of the total variance and may be taken as an ecological dimension of water quality. Cluster analysis (CA is an
Li, Jia; Zhang, Haibo; Chen, Yongshan; Luo, Yongming; Zhang, Hua
2016-07-01
To quantify the extent of antibiotic contamination and to identity the dominant pollutant sources in the Tiaoxi River Watershed, surface water samples were collected at eight locations and analyzed for four tetracyclines and three sulfonamides using ultra-performance liquid chromatography tandem mass spectrometry (UPLC-MS/MS). The observed maximum concentrations of tetracycline (623 ng L(-1)), oxytetracycline (19,810 ng L(-1)), and sulfamethoxazole (112 ng L(-1)) exceeded their corresponding Predicted No Effect Concentration (PNEC) values. In particular, high concentrations of antibiotics were observed in wet summer with heavy rainfall. The maximum concentrations of antibiotics appeared in the vicinity of intensive aquaculture areas. High-resolution land use data were used for identifying diffuse source of antibiotic pollution in the watershed. Significant correlations between tetracycline and developed (r = 0.93), tetracycline and barren (r = 0.87), oxytetracycline and barren (r = 0.82), and sulfadiazine and agricultural facilities (r = 0.71) were observed. In addition, the density of aquaculture significantly correlated with doxycycline (r = 0.74) and oxytetracycline (r = 0.76), while the density of livestock significantly correlated with sulfadiazine (r = 0.71). Principle Component Analysis (PCA) indicated that doxycycline, tetracycline, oxytetracycline, and sulfamethoxazole were from aquaculture and domestic sources, whereas sulfadiazine and sulfamethazine were from livestock wastewater. Flood or drainage from aquaculture ponds was identified as a major source of antibiotics in the Tiaoxi watershed. A hot-spot map was created based on results of land use analysis and multi-variable statistics, which provided an effective management tool of sources identification in watersheds with multiple diffuse sources of antibiotic pollution.
Brauchler, R.; Cheng, J.; Dietrich, P.; Everett, M.; Johnson, B.; Sauter, M.
2005-12-01
Knowledge about the spatial variations in hydraulic properties plays an important role controlling solute movement in saturated flow systems. Traditional hydrogeological approaches appear to have difficulties providing high resolution parameter estimates. Thus, we have decided to develop an approach coupling the two existing hydraulic tomographic approaches: a) Inversion of the drawdown as a function of time (amplitude inversion) and b) the inversion of travel times of the pressure disturbance. The advantages of hydraulic travel time tomography are its high structural resolution and computational efficiency. However, travel times are primarily controlled by the aquifer diffusivity making it difficult to determine hydraulically conductivity and storage. Amplitude inversion on the other hand is able to determine hydraulic conductivity and storage separately, but the heavy computational burden of the amplitude inversion is often a shortcoming, especially for larger data sets. Our coupled inversion approach was developed and tested using synthetic data sets. The data base of the inversion comprises simulated slug tests, in which the position of the sources (injection ports) isolated with packers, are varied between the tests. The first step was the inversion of several characteristic travel times (e.g. early, intermediate and late travel times) in order to determine the diffusivity distribution. Secondly, the resulting diffusivity distributions were classified into homogeneous groups in order to differentiate between hydrogeological units characterized by a significant diffusivity contrast. The classification was performed by using multivariate statistics. With a numerical flow model and an automatic parameter estimator the amplitude inversion was performed in a final step. The classified diffusivity distribution is an excellent starting model for the amplitude inversion and allows to reduce strongly the calculation time. The final amplitude inversion overcomes
Baez-Cazull, S. E.; McGuire, J.T.; Cozzarelli, I.M.; Voytek, M.A.
2008-01-01
Determining the processes governing aqueous biogeochemistry in a wetland hydrologically linked to an underlying contaminated aquifer is challenging due to the complex exchange between the systems and their distinct responses to changes in precipitation, recharge, and biological activities. To evaluate temporal and spatial processes in the wetland-aquifer system, water samples were collected using cm-scale multichambered passive diffusion samplers (peepers) to span the wetland-aquifer interface over a period of 3 yr. Samples were analyzed for major cations and anions, methane, and a suite of organic acids resulting in a large dataset of over 8000 points, which was evaluated using multivariate statistics. Principal component analysis (PCA) was chosen with the purpose of exploring the sources of variation in the dataset to expose related variables and provide insight into the biogeochemical processes that control the water chemistry of the system. Factor scores computed from PCA were mapped by date and depth. Patterns observed suggest that (i) fermentation is the process controlling the greatest variability in the dataset and it peaks in May; (ii) iron and sulfate reduction were the dominant terminal electron-accepting processes in the system and were associated with fermentation but had more complex seasonal variability than fermentation; (iii) methanogenesis was also important and associated with bacterial utilization of minerals as a source of electron acceptors (e.g., barite BaSO4); and (iv) seasonal hydrological patterns (wet and dry periods) control the availability of electron acceptors through the reoxidation of reduced iron-sulfur species enhancing iron and sulfate reduction. Copyright ?? 2008 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.
Marinović Ruždjak, Andrea; Ruždjak, Domagoj
2015-04-01
For the evaluation of seasonal and spatial variations and the interpretation of a large and complex water quality dataset obtained during a 7-year monitoring program of the Sava River in Croatia, different multivariate statistical techniques were applied in this study. Basic statistical properties and correlations of 18 water quality parameters (variables) measured at 18 sampling sites (a total of 56,952 values) were examined. Correlations between air temperature and some water quality parameters were found in agreement with the previous studies of relationship between climatic and hydrological parameters. Principal component analysis (PCA) was used to explore the most important factors determining the spatiotemporal dynamics of the Sava River. PCA has determined a reduced number of seven principal components that explain over 75 % of the data set variance. The results revealed that parameters related to temperature and organic pollutants (CODMn and TSS) were the most important parameters contributing to water quality variation. PCA analysis of seasonal subsets confirmed this result and showed that the importance of parameters is changing from season to season. PCA of the four seasonal data subsets yielded six PCs with eigenvalues greater than one explaining 73.6 % (spring), 71.4 % (summer), 70.3 % (autumn), and 71.3 % (winter) of the total variance. To check the influence of the outliers in the data set whose distribution strongly deviates from the normal one, in addition to standard principal component analysis algorithm, two robust estimates of covariance matrix were calculated and subjected to PCA. PCA in both cases yielded seven principal components explaining 75 % of the total variance, and the results do not differ significantly from the results obtained by the standard PCA algorithm. With the implementation of robust PCA algorithm, it is demonstrated that the usage of standard algorithm is justified for data sets with small numbers of missing data
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H; Fischl, Bruce
2016-07-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer's and Huntington's diseases (Salat et al., 2010; Rosas et al., 2006). The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as diffusion tensor imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer's disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of
Energy Technology Data Exchange (ETDEWEB)
Mattingly, J.K.
2001-03-08
The development of high order statistical analyses applied to measurements of the temporal evolution of fission chain-reactions is described. These statistics are derived via application of Bayes' rule to conditional probabilities describing a sequence of events in a fissile system beginning with the initiation of a chain-reaction by source neutrons and ending with counting events in a collection of neutron-sensitive detectors. Two types of initiating neutron sources are considered: (1) a directly observable source introduced by the experimenter (active initiation), and (2) a source that is intrinsic to the system and is not directly observable (passive initiation). The resulting statistics describe the temporal distribution of the population of prompt neutrons in terms of the time-delays between members of a collection (an n-tuplet) of correlated detector counts, that, in turn, may be collectively correlated with a detected active source neutron emission. These developments are a unification and extension of Rossi-a, pulsed neutron, and neutron noise methods, each of which measure the temporal distribution of pairs of correlated events, to produce a method that measures the temporal distribution of n-tuplets of correlated counts of arbitrary dimension n. In general the technique should expand present capabilities in the analysis of neutron counting measurements.
Equilibrium Statistics: Monte Carlo Methods
Kröger, Martin
Monte Carlo methods use random numbers, or ‘random’ sequences, to sample from a known shape of a distribution, or to extract distribution by other means. and, in the context of this book, to (i) generate representative equilibrated samples prior being subjected to external fields, or (ii) evaluate high-dimensional integrals. Recipes for both topics, and some more general methods, are summarized in this chapter. It is important to realize, that Monte Carlo should be as artificial as possible to be efficient and elegant. Advanced Monte Carlo ‘moves’, required to optimize the speed of algorithms for a particular problem at hand, are outside the scope of this brief introduction. One particular modern example is the wavelet-accelerated MC sampling of polymer chains [406].
Statistical methods for nuclear material management
Energy Technology Data Exchange (ETDEWEB)
Bowen W.M.; Bennett, C.A. (eds.)
1988-12-01
This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems.
Statistical Methods for Material Characterization and Qualification
Energy Technology Data Exchange (ETDEWEB)
Kercher, A.K.
2005-04-01
This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.
Statistical methods for material characterization and qualification
Energy Technology Data Exchange (ETDEWEB)
Hunn, John D [ORNL; Kercher, Andrew K [ORNL
2005-01-01
This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.
Statistical methods for material characterization and qualification
Energy Technology Data Exchange (ETDEWEB)
Hunn, John D [ORNL; Kercher, Andrew K [ORNL
2005-01-01
This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.
Statistical Methods for Material Characterization and Qualification
Energy Technology Data Exchange (ETDEWEB)
Kercher, A.K.
2005-04-01
This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.
Publishing nutrition research: a review of multivariate techniques--part 3: data reduction methods.
Gleason, Philip M; Boushey, Carol J; Harris, Jeffrey E; Zoellner, Jamie
2015-07-01
This is the ninth in a series of monographs on research design and analysis, and the third in a set of these monographs devoted to multivariate methods. The purpose of this article is to provide an overview of data reduction methods, including principal components analysis, factor analysis, reduced rank regression, and cluster analysis. In the field of nutrition, data reduction methods can be used for three general purposes: for descriptive analysis in which large sets of variables are efficiently summarized, to create variables to be used in subsequent analysis and hypothesis testing, and in questionnaire development. The article describes the situations in which these data reduction methods can be most useful, briefly describes how the underlying statistical analyses are performed, and summarizes how the results of these data reduction methods should be interpreted.
Zero Assignment in Multivariable System Using Pole Assignment Method
Smagina, Ye.
2002-01-01
In the paper we consider the invariant zero assignment problem in a linear multivariable system with several inputs/outputs by constructing a system output matrix. The problem is reduced to the pole assignment problem by a state feedback (modal control) in a descriptor system or a regular one. It is shown that the zero assignment and pole assignment are mathematically equivalent problems.
Directory of Open Access Journals (Sweden)
Jan Bocianowski
2013-03-01
Full Text Available The paper presents a multivariable approach to the estimation of variability for quantitative traits after using the seven methods of magnesium application of the “stay-green” type of maize (Zea mays L. hybrid. The 13 characteristics of LG 2244 hybrid were under consideration in three years (2006-2008: grain yield, moisture of grain, 1000 grain yield, dry matter of a single plant, dry matter yield, uptake of N, uptake of P, uptake of K, uptake of Mg, uptake of Ca, chlorophyll a, chlorophyll b and chlorophyll a+b content. The obtained results were computed with statistical multivariable methods of application. Canonical variable analysis (in each year independent has proved to be an effective tool for clear assessing of differences among the studied methods of magnesium application. The most diverse methods were: C and E (in 2006, A and F (in 2007, and B as G (in 2008. The most similar methods (in respect of 13 traits simultaneously were: B and C (in 2006, D and E (in 2007, and C and E (in 2008. Mahalanobis’ distances between methods of magnesium application in individual years of the study were not significantly correlated.
Lapuyade-Lahorgue, Jerome; Xue, Jing-Hao; Ruan, Su
2017-03-21
Nowadays, multi-source image acquisition attracts an increasing interest in many fields such as multi-modal medical image segmentation. Such acquisition aims at considering complementary information to perform image segmentation since the same scene has been observed by various types of images. However, strong dependency often exists between multi-source images. This dependency should be taken into account when we try to extract joint information for precisely making a decision. In order to statistically model this dependency between multiple sources, we propose a novel multi-source fusion method based on the Gaussian copula. The proposed fusion model is integrated in a statistical framework with the hidden Markov field inference in order to delineate a target volume from multi-source images. Estimation of parameters of the models and segmentation of the images are jointly performed by an iterative algorithm based on Gibbs sampling. Experiments are performed on multi-sequence MRI to segment tumors. The results show that the proposed method based on the Gaussian copula is effective to accomplish multi-source image segmentation.
Jacobson, Dan; Monforte, Ana Rita; Silva Ferreira, António César
2013-03-13
Chromatography separates the different components of complex mixtures and generates a fingerprint representing the chemical composition of the sample. The resulting data structure depends on the characteristics of the detector used, univariate for devices such as a flame ionization detector (FID) or multivariate for mass spectroscopy (MS). This study addresses the potential use of a univariate signal for a nontargeted approach to (i) classify samples according to a given process or perturbation, (ii) evaluate the feasibility of developing a screening procedure to select candidates related to the process, and (iii) provide insight into the chemical mechanisms that are affected by the perturbation. To achieve this, it was necessary to use and develop methods for data preprocessing and visualization tools to assist an analytical chemist to view and interpret complex multidimensional data sets. Dichloromethane Port wine extracts were collected using GC-FID; the chromatograms were then aligned with correlation optimized warping (COW) and subsequently analyzed with multivariate statistics (MVA) by principal component analysis (PCA) and partial least-squares regression (PLS-R). Furthermore, wavelets were used for peak calling and alignment refinement, and the resulting matrix was used to perform kinetic network reconstruction via correlation networks and maximum spanning trees. Network-target correlation projections were used to screen for potential chromatographic regions/peaks related to aging mechanisms. Results from PLS between aligned chromatograms and target molecules showed high X to Y correlations of 0.91, 092, and 0.89 with 5-hydroxymethylfurfural (HMF) (Maillard), acetaldehyde (oxidation), and 4,5-dimethyl-(5H)-3-hydroxy-2-furanone, respectively. The context of the correlation (and therefore likely kinetic) relationships among compounds detected by GC-FID and the relationships between target compounds within different regions of the network can be clearly seen.
Multivariate analysis with LISREL
Jöreskog, Karl G; Y Wallentin, Fan
2016-01-01
This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.
Directory of Open Access Journals (Sweden)
Said Nawar
2015-01-01
Full Text Available Modeling and mapping of soil properties has been identified as key for effective land degradation management and mitigation. The ability to model and map soil properties at sufficient accuracy for a large agriculture area is demonstrated using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER imagery. Soil samples were collected in the El-Tina Plain, Sinai, Egypt, concurrently with the acquisition of ASTER imagery, and measured for soil electrical conductivity (ECe, clay content and soil organic matter (OM. An ASTER image covering the study area was preprocessed, and two predictive models, multivariate adaptive regression splines (MARS and the partial least squares regression (PLSR, were constructed based on the ASTER spectra. For all three soil properties, the results of MARS models were better than those of the respective PLSR models, with cross-validation estimated R2 of 0.85 and 0.80 for ECe, 0.94 and 0.90 for clay content and 0.79 and 0.73 for OM. Independent validation of ECe, clay content and OM maps with 32 soil samples showed the better performance of the MARS models, with R2 = 0.81, 0.89 and 0.73, respectively, compared to R2 = 0.78, 0.87 and 0.71 for the PLSR models. The results indicated that MARS is a more suitable and superior modeling technique than PLSR for the estimation and mapping of soil salinity (ECe, clay content and OM. The method developed in this paper was found to be reliable and accurate for digital soil mapping in arid and semi-arid environments.
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Malm, Christer B.; Khoo, Nelson S.; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas
2016-01-01
The discovery of erythropoietin (EPO) simplified blood doping in sports, but improved detection methods, for EPO has forced cheating athletes to return to blood transfusion. Autologous blood transfusion with cryopreserved red blood cells (RBCs) is the method of choice, because no valid method exists to accurately detect such event. In endurance sports, it can be estimated that elite athletes improve performance by up to 3% with blood doping, regardless of method. Valid detection methods for autologous blood doping is important to maintain credibility of athletic performances. Recreational male (N = 27) and female (N = 11) athletes served as Transfusion (N = 28) and Control (N = 10) subjects in two different transfusion settings. Hematological variables and physical performance were measured before donation of 450 or 900 mL whole blood, and until four weeks after re-infusion of the cryopreserved RBC fraction. Blood was analyzed for transferrin, iron, Hb, EVF, MCV, MCHC, reticulocytes, leucocytes and EPO. Repeated measures multivariate analysis of variance (MANOVA) and pattern recognition using Principal Component Analysis (PCA) and Orthogonal Projections of Latent Structures (OPLS) discriminant analysis (DA) investigated differences between Control and Transfusion groups over time. Significant increase in performance (15 ± 8%) and VO2max (17 ± 10%) (mean ± SD) could be measured 48 h after RBC re-infusion, and remained increased for up to four weeks in some subjects. In total, 533 blood samples were included in the study (Clean = 220, Transfused = 313). In response to blood transfusion, the largest change in hematological variables occurred 48 h after blood donation, when Control and Transfused groups could be separated with OPLS-DA (R2 = 0.76/Q2 = 0.59). RBC re-infusion resulted in the best model (R2 = 0.40/Q2 = 0.10) at the first sampling point (48 h), predicting one false positive and one false negative. Over all, a 25% and 86% false positives ratio was
Meskaldji, Djalel Eddine; Hagmann, Patric; Meuli, Reto; Thiran, Jean Philippe; Morgenthaler, Stephan
2010-01-01
In neuroimaging, a large number of correlated tests are routinely performed to detect active voxels in single-subject experiments or to detect regions that differ between individuals belonging to different groups. In order to bound the probability of a false discovery of pair-wise differences, a Bonferroni or other correction for multiplicity is necessary. These corrections greatly reduce the power of the comparisons which means that small signals (differences) remain hidden and therefore have been more or less successful depending on the application. We introduce a method that improves the power of a family of correlated statistical tests by reducing their number in an orderly fashion using our a-priori understanding of the problem . The tests are grouped by blocks that respect the data structure and only one or a few tests per group are performed. For each block we construct an appropriate summary statistic that characterizes a meaningful feature of the block. The comparisons are based on these summary stat...
SOME STATISTICAL SOFTWARE APPLICATIONS FOR TAGUCHI METHODS
Directory of Open Access Journals (Sweden)
Adrian Stere PARIS
2016-05-01
Full Text Available The paper details the variety of Taguchi methods, as important contribution to the quality improvement. The extended use of these methods imposes more and more complex calculi for the practical application and optimization. It should be necessary to benefit by the new software developments, assisted by the advanced statistical methods. The paper presents a few particular applications of some statistical software for the Taguchi methods as a quality enhancement insisting on the quality loss functions, the design of experiments and the new developments of statistical process control.
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-09-19
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km(2), with a median of 0.4 samples per km(2). The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis
Advanced statistical methods in data science
Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao
2016-01-01
This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...
Poucheret, Patrick; Fons, Françoise; Doré, Jean Christophe; Michelot, Didier; Rapior, Sylvie
2010-06-15
Ninety percent of fatal higher fungus poisoning is due to amatoxin-containing mushroom species. In addition to absence of antidote, no chemotherapeutic consensus was reported. The aim of the present study is to perform a retrospective multidimensional multivariate statistic analysis of 2110 amatoxin poisoning clinical cases, in order to optimize therapeutic decision-making. Our results allowed to classify drugs as a function of their influence on one major parameter: patient survival. Active principles were classified as first intention, second intention, adjuvant or controversial pharmaco-therapeutic clinical intervention. We conclude that (1) retrospective multidimensional multivariate statistic analysis of complex clinical dataset might help future therapeutic decision-making and (2) drugs such as silybin, N-acetylcystein and putatively ceftazidime are clearly associated, in amatoxin poisoning context, with higher level of patient survival.
Directory of Open Access Journals (Sweden)
Tiago P. Almeida
2017-08-01
Full Text Available Purpose: Complex fractionated atrial electrograms (CFAE-guided ablation after pulmonary vein isolation (PVI has been used for persistent atrial fibrillation (persAF therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model.Methods: 207 pairs of atrial electrograms (AEGs were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA and linear discriminant analysis (LDA have been used to characterize the atrial regions and AEGs.Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001. Four types of LA regions were identified, based on the AEGs characteristics: (i fractionated before PVI that remained fractionated after PVI (31% of the collected points; (ii fractionated that converted to normal (39%; (iii normal prior to PVI that became fractionated (9% and; (iv normal that remained normal (21%. Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination (P < 0.0001.Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Energy Technology Data Exchange (ETDEWEB)
Carrasquilla, Abel [Universidade Estadual do Norte Fluminense Darcy Ribeiro (UENF), Macae, RJ (Brazil). Lab. de Engenharia e Exploracao de Petroleo]. E-mail: abel@lenep.uenf.br; Silva, Jadir da [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Dept. de Geologia; Flexa, Roosevelt [Baker Hughes do Brasil Ltda, Macae, RJ (Brazil)
2008-07-01
In this article, we present a new approach to the automatic identification of lithologies using only well log data, which associates fuzzy logic, neural networks and multivariable statistic methods. Firstly, we chose well log data that represents lithological types, as gamma rays (GR) and density (RHOB), and, immediately, we applied a fuzzy logic algorithm to determine optimal number of clusters. In the following step, a competitive neural network is developed, based on Kohonen's learning rule, where the input layer is composed of two neurons, which represent the same number of used logs. On the other hand, the competitive layer is composed by several neurons, which have the same number of clusters as determined by the fuzzy logic algorithm. Finally, some data bank elements of the lithological types are selected at random to be the discriminate variables, which correspond to the input data of the multigroup discriminate analysis program. In this form, with the application of this methodology, the lithological types were automatically identified throughout the a well of the Namorado Oil Field, Campos Basin, which presented some difficulty in the results, mainly because of geological complexity of this field. (author)
Cao, Yingjie; Tang, Changyuan; Song, Xianfang; Liu, Changming; Zhang, Yinghua
2016-06-01
Two multivariate statistical technologies, factor analysis (FA) and discriminant analysis (DA), are applied to study the river and groundwater hydrochemistry and its controlling processes in the Sanjiang Plain of the northeast China. Factor analysis identifies five factors which account for 79.65 % of the total variance in the dataset. Four factors bearing specific meanings as the river and groundwater hydrochemistry controlling processes are divided into two groups, the "natural hydrochemistry evolution" group and the "pollution" group. The "natural hydrochemistry evolution" group includes the salinity factor (factor 1) caused by rock weathering and the residence time factor (factor 2) reflecting the groundwater traveling time. The "pollution" group represents the groundwater quality deterioration due to geogenic pollution caused by elevated Fe and Mn (factor 3) and elevated nitrate (NO3 -) introduced by human activities such as agriculture exploitations (factor 5). The hydrochemical difference and hydraulic connection among rivers (surface water, SW), shallow groundwater (SG) and deep groundwater (DG) group are evaluated by the factor scores obtained from FA and DA (Fisher's method). It is showed that the river water is characterized as low salinity and slight pollution, and the shallow groundwater has the highest salinity and severe pollution. The SW is well separated from SG and DG by Fisher's discriminant function, but the SG and DG can not be well separated showing their hydrochemical similarities, and emphasize hydraulic connections between SG and DG.
Ebqa'ai, Mohammad; Ibrahim, Bashar
2017-03-10
This study aims to analyse the heavy metal pollutants in Jeddah, the second largest city in the Gulf Cooperation Council with a population exceeding 3.5 million, and many vehicles. Ninety-eight street dust samples were collected seasonally from the six major roads as well as the Jeddah Beach, and subsequently digested using modified Leeds Public Analyst method. The heavy metals (Fe, Zn, Mn, Cu, Cd, and Pb) were extracted from the ash using methyl isobutyl ketone as solvent extraction and eventually analysed by atomic absorption spectroscopy. Multivariate statistical techniques, principal component analysis (PCA), and hierarchical cluster analysis were applied to these data. Heavy metal concentrations were ranked according to the following descending order: Fe > Zn > Mn > Cu > Pb > Cd. In order to study the pollution and health risk from these heavy metals as well as estimating their effect on the environment, pollution indices, integrated pollution index, enrichment factor, daily dose average, hazard quotient, and hazard index were all analysed. The PCA showed high levels of Zn, Fe, and Cd in Al Kurnish road, while these elements were consistently detected on King Abdulaziz and Al Madina roads. The study indicates that high levels of Zn and Pb pollution were recorded for major roads in Jeddah. Six out of seven roads had high pollution indices. This study is the first step towards further investigations into current health problems in Jeddah, such as anaemia and asthma.
Venkatapathi, Murugesan; Rajwa, Bartek; Ragheb, Kathy; Banada, Padmapriya P.; Lary, Todd; Robinson, J. Paul; Hirleman, E. Daniel
2008-02-01
We describe a model-based instrument design combined with a statistical classification approach for the development and realization of high speed cell classification systems based on light scatter. In our work, angular light scatter from cells of four bacterial species of interest, Bacillus subtilis, Escherichia coli, Listeria innocua, and Enterococcus faecalis, was modeled using the discrete dipole approximation. We then optimized a scattering detector array design subject to some hardware constraints, configured the instrument, and gathered experimental data from the relevant bacterial cells. Using these models and experiments, it is shown that optimization using a nominal bacteria model (i.e., using a representative size and refractive index) is insufficient for classification of most bacteria in realistic applications. Hence the computational predictions were constituted in the form of scattering-data-vector distributions that accounted for expected variability in the physical properties between individual bacteria within the four species. After the detectors were optimized using the numerical results, they were used to measure scatter from both the known control samples and unknown bacterial cells. A multivariate statistical method based on a support vector machine (SVM) was used to classify the bacteria species based on light scatter signatures. In our final instrument, we realized correct classification of B. subtilis in the presence of E. coli,L. innocua, and E. faecalis using SVM at 99.1%, 99.6%, and 98.5%, respectively, in the optimal detector array configuration. For comparison, the corresponding values for another set of angles were only 69.9%, 71.7%, and 70.2% using SVM, and more importantly, this improved performance is consistent with classification predictions.
Venkatapathi, Murugesan; Rajwa, Bartek; Ragheb, Kathy; Banada, Padmapriya P; Lary, Todd; Robinson, J Paul; Hirleman, E Daniel
2008-02-10
We describe a model-based instrument design combined with a statistical classification approach for the development and realization of high speed cell classification systems based on light scatter. In our work, angular light scatter from cells of four bacterial species of interest, Bacillus subtilis, Escherichia coli, Listeria innocua, and Enterococcus faecalis, was modeled using the discrete dipole approximation. We then optimized a scattering detector array design subject to some hardware constraints, configured the instrument, and gathered experimental data from the relevant bacterial cells. Using these models and experiments, it is shown that optimization using a nominal bacteria model (i.e., using a representative size and refractive index) is insufficient for classification of most bacteria in realistic applications. Hence the computational predictions were constituted in the form of scattering-data-vector distributions that accounted for expected variability in the physical properties between individual bacteria within the four species. After the detectors were optimized using the numerical results, they were used to measure scatter from both the known control samples and unknown bacterial cells. A multivariate statistical method based on a support vector machine (SVM) was used to classify the bacteria species based on light scatter signatures. In our final instrument, we realized correct classification of B. subtilis in the presence of E. coli,L. innocua, and E. faecalis using SVM at 99.1%, 99.6%, and 98.5%, respectively, in the optimal detector array configuration. For comparison, the corresponding values for another set of angles were only 69.9%, 71.7%, and 70.2% using SVM, and more importantly, this improved performance is consistent with classification predictions.
Statistical methods for environmental pollution monitoring
Energy Technology Data Exchange (ETDEWEB)
Gilbert, R.O.
1987-01-01
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.
Statistical Methods for Environmental Pollution Monitoring
Energy Technology Data Exchange (ETDEWEB)
Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
1987-01-01
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.
ABOUT THE METHODOLOGY OF STATISTICAL METHODS
Orlov A. I.
2014-01-01
The purpose of the article - to justify the need to develop the methodology of statistical methods as an independent scientific direction. The models of mathematician and applied specialist are presented. We have obtained the conclusions on teaching and research and discussed five major unsolved problems of statistical methods: the effect of deviations from the traditional prerequisites; use asymptotic results for finite sample sizes; selecting one of the many specific tests for the hypothesi...
Modern statistical methods in respiratory medicine.
Wolfe, Rory; Abramson, Michael J
2014-01-01
Statistics sits right at the heart of scientific endeavour in respiratory medicine and many other disciplines. In this introductory article, some key epidemiological concepts such as representativeness, random sampling, association and causation, and confounding are reviewed. A brief introduction to basic statistics covering topics such as frequentist methods, confidence intervals, hypothesis testing, P values and Type II error is provided. Subsequent articles in this series will cover some modern statistical methods including regression models, analysis of repeated measures, causal diagrams, propensity scores, multiple imputation, accounting for measurement error, survival analysis, risk prediction, latent class analysis and meta-analysis.
Khan, Firdos; Pilz, Jürgen
2016-04-01
South Asia is under the severe impacts of changing climate and global warming. The last two decades showed that climate change or global warming is happening and the first decade of 21st century is considered as the warmest decade over Pakistan ever in history where temperature reached 53 0C in 2010. Consequently, the spatio-temporal distribution and intensity of precipitation is badly effected and causes floods, cyclones and hurricanes in the region which further have impacts on agriculture, water, health etc. To cope with the situation, it is important to conduct impact assessment studies and take adaptation and mitigation remedies. For impact assessment studies, we need climate variables at higher resolution. Downscaling techniques are used to produce climate variables at higher resolution; these techniques are broadly divided into two types, statistical downscaling and dynamical downscaling. The target location of this study is the monsoon dominated region of Pakistan. One reason for choosing this area is because the contribution of monsoon rains in this area is more than 80 % of the total rainfall. This study evaluates a statistical downscaling technique which can be then used for downscaling climatic variables. Two statistical techniques i.e. quantile regression and copula modeling are combined in order to produce realistic results for climate variables in the area under-study. To reduce the dimension of input data and deal with multicollinearity problems, empirical orthogonal functions will be used. Advantages of this new method are: (1) it is more robust to outliers as compared to ordinary least squares estimates and other estimation methods based on central tendency and dispersion measures; (2) it preserves the dependence among variables and among sites and (3) it can be used to combine different types of distributions. This is important in our case because we are dealing with climatic variables having different distributions over different meteorological
Directory of Open Access Journals (Sweden)
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
Directory of Open Access Journals (Sweden)
Xiangyu Mu
2014-09-01
Full Text Available Natural factors and anthropogenic activities both contribute dissolved chemical loads to lakes and streams. Mineral solubility, geomorphology of the drainage basin, source strengths and climate all contribute to concentrations and their variability. Urbanization and agriculture waste-water particularly lead to aquatic environmental degradation. Major contaminant sources and controls on water quality can be asssessed by analyzing the variability in proportions of major and minor solutes in water coupled to mutivariate statistical methods. The demand for freshwater needed for increasing crop production puulation and industrialization occurs almost everywhere in in China and these conflicting needs have led to widespread water contamination. Because of heavy nutrient loadings from all of these sources, Lake Taihu (eastern China notably suffers periodic hyper-eutrophication and drinking water deterioration, which has led to shortages of freshwater for the City of Wuxi and other nearby cities. This lake, the third largest freshwater body in China, has historically beeen considered a cultural treasure of China, and has supported long-term fisheries. The is increasing pressure to remediate the present contamination which compromises both aquiculture and the prior economic base centered on tourism. However, remediation cannot be effectively done without first characterizing the broad nature of the non-point source pollution. To this end, we investigated the hydrochemical setting of Lake Taihu to determine how different land use types influence the variability of surface water chemistry in different water sources to the lake. We found that waters broadly show wide variability ranging from calcium-magnesium-bicarbonate hydrochemical facies type to mixed sodium-sulfate-chloride type. Principal components analysis produced three principal components that explained 78% of the variance in the water quality and reflect three major types of water
Directory of Open Access Journals (Sweden)
Shikha Awasthi
2017-06-01
Full Text Available Analysis of emission from laser-induced plasma has a unique capability for quantifying the major and minor elements present in any type of samples under optimal analysis conditions. Chemometric techniques are very effective and reliable tools for quantification of multiple components in complex matrices. The feasibility of laser-induced breakdown spectroscopy (LIBS in combination with multivariate analysis was investigated for the analysis of environmental reference materials (RMs. In the present work, different (Certified/Standard Reference Materials of soil and plant origin were analyzed using LIBS and the presence of Al, Ca, Mg, Fe, K, Mn and Si were identified in the LIBS spectra of these materials. Multivariate statistical methods (Partial Least Square Regression and Partial Least Square Discriminant Analysis were employed for quantitative analysis of the constituent elements using the LIBS spectral data. Calibration models were used to predict the concentrations of the different elements of test samples and subsequently, the concentrations were compared with certified concentrations to check the authenticity of models. The non-destructive analytical method namely Instrumental Neutron Activation Analysis (INAA using high flux reactor neutrons and high resolution gamma-ray spectrometry was also used for intercomparison of results of two RMs by LIBS.
Fuchs, Julia; Cermak, Jan; Andersen, Hendrik
2017-04-01
This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.
Directory of Open Access Journals (Sweden)
Ewelina Dziurkowska
2015-01-01
Full Text Available Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer’s and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA and principal component analysis (PCA, to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients’ age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods.
Dziurkowska, Ewelina; Wesolowski, Marek
2015-01-01
Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer's and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA) and principal component analysis (PCA), to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients' age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians) characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods.
Chen, Fei; Taylor, William D; Anderson, William B; Huck, Peter M
2013-08-01
This study investigates the suitability of multivariate techniques, including principal component analysis and discriminant function analysis, for analysing polycyclic aromatic hydrocarbon and heavy metal-contaminated aquatic sediment data. We show that multivariate "fingerprint" analysis of relative abundances of contaminants can characterize a contamination source and distinguish contaminated sediments of interest from background contamination. Thereafter, analysis of the unstandardized concentrations among samples contaminated from the same source can identify migration pathways within a study area that is hydraulically complex and has a long contamination history, without reliance on complex hydrodynamic data and modelling techniques. Together, these methods provide an effective tool for drinking water source monitoring and protection.
Fourier transform infrared microspectroscopy and multivariate methods for radiobiological dosimetry.
Meade, A D; Clarke, C; Byrne, H J; Lyng, F M
2010-02-01
The scientific literature contains an ever-growing number of reports of applications of vibrational spectroscopy as a multivariate non-invasive tool for analysis of biological effects at the molecular level. Recently, Fourier transform infrared microspectroscopy (FTIRM) has been demonstrated to be sensitive to molecular events occurring in cells and tissue after exposure to ionizing radiation. In this work the application of FTIRM in the examination of dose-dependent molecular effects occurring in skin cells after exposure to ionizing radiation with the use of partial least-squares regression (PLSR) and generalized regression neural networks (GRNN) was studied. The methodology is shown to be sensitive to molecular events occurring with radiation dose and time after exposure. The variation in molecular species with dose and time after irradiation is shown to be non-linear by virtue of the higher modeling efficiency yielded from the non-linear algorithms. Dose prediction efficiencies of approximately +/-10 mGy were achieved at 96 h after irradiation, highlighting the potential applications of the methodology in radiobiological dosimetry.
Spatial analysis statistics, visualization, and computational methods
Oyana, Tonny J
2015-01-01
An introductory text for the next generation of geospatial analysts and data scientists, Spatial Analysis: Statistics, Visualization, and Computational Methods focuses on the fundamentals of spatial analysis using traditional, contemporary, and computational methods. Outlining both non-spatial and spatial statistical concepts, the authors present practical applications of geospatial data tools, techniques, and strategies in geographic studies. They offer a problem-based learning (PBL) approach to spatial analysis-containing hands-on problem-sets that can be worked out in MS Excel or ArcGIS-as well as detailed illustrations and numerous case studies. The book enables readers to: Identify types and characterize non-spatial and spatial data Demonstrate their competence to explore, visualize, summarize, analyze, optimize, and clearly present statistical data and results Construct testable hypotheses that require inferential statistical analysis Process spatial data, extract explanatory variables, conduct statisti...
Workshop on Analytical Methods in Statistics
Jurečková, Jana; Maciak, Matúš; Pešta, Michal
2017-01-01
This volume collects authoritative contributions on analytical methods and mathematical statistics. The methods presented include resampling techniques; the minimization of divergence; estimation theory and regression, eventually under shape or other constraints or long memory; and iterative approximations when the optimal solution is difficult to achieve. It also investigates probability distributions with respect to their stability, heavy-tailness, Fisher information and other aspects, both asymptotically and non-asymptotically. The book not only presents the latest mathematical and statistical methods and their extensions, but also offers solutions to real-world problems including option pricing. The selected, peer-reviewed contributions were originally presented at the workshop on Analytical Methods in Statistics, AMISTAT 2015, held in Prague, Czech Republic, November 10-13, 2015.
Multivariate extreme value analysis of storm surges in SCS on peak over threshold method
Directory of Open Access Journals (Sweden)
Y. Luo
2015-11-01
Full Text Available We use a novel statistical approach-MGPD to analyze the joint probability distribution of storm surge events at two sites and present a warning method for storm surges at two adjacent positions in Beibu Gulf, using the sufficiently long field data on surge levels at two sites. The methodology also develops the procedure of application of MGPD, which includes joint threshold and Monte Carlo simulation, to handle multivariate extreme values analysis. By comparing the simulation result with analytic solution, it is shown that the relative error of the Monte Carlo simulation is less than 8.6 %. By running MGPD model based on long data at Beihai and Dongfang, the simulated potential surge results can be employed in storm surge warnings of Beihai and joint extreme water level predictions of two sites.
A multivariate nonlinear mixed effects method for analyzing energy partitioning in growing pigs
DEFF Research Database (Denmark)
Strathe, Anders Bjerring; Danfær, Allan Christian; Chwalibog, André
2010-01-01
Simultaneous equations have become increasingly popular for describing the effects of nutrition on the utilization of ME for protein (PD) and lipid deposition (LD) in animals. The study developed a multivariate nonlinear mixed effects (MNLME) framework and compared it with an alternative method...... for estimating parameters in simultaneous equations that described energy metabolism in growing pigs, and then proposed new PD and LD equations. The general statistical framework was implemented in the NLMIXED procedure in SAS. Alternative PD and LD equations were also developed, which assumed...... that the instantaneous response curve of an animal to varying energy supply followed the law of diminishing returns behavior. The Michaelis-Menten function was adopted to represent a biological relationship in which the affinity constant (k) represented the sensitivity of PD to ME above maintenance. The approach...
An Alternating Iterative Method and Its Application in Statistical Inference
Institute of Scientific and Technical Information of China (English)
Ning Zhong SHI; Guo Rong HU; Qing CUI
2008-01-01
This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems, the objective functions are not convex. In this paper, we give a definition of a semi-convex objective function and discuss the corresponding non-convex programming problems. A two-step iterative algorithm called the alternating iterative method is proposed for finding solutions for such problems. The method is illustrated by three examples in constrained estimation problems given in Sasabuchi et al. (Biometrika, 72, 465–472 (1983)), Shi N. Z. (J. Multivariate Anal.,50, 282–293 (1994)) and El Barmi H. and Dykstra R. (Ann. Statist., 26, 1878–1893 (1998)).
Multivariate statistical analysis of stream-sediment geochemistry in the Grazer Paläozoikum, Austria
Weber, L.; Davis, J.C.
1990-01-01
The Austrian reconnaissance study of stream-sediment composition — more than 30000 clay-fraction samples collected over an area of 40000 km2 — is summarized in an atlas of regional maps that show the distributions of 35 elements. These maps, rich in information, reveal complicated patterns of element abundance that are difficult to compare on more than a small number of maps at one time. In such a study, multivariate procedures such as simultaneous R-Q mode components analysis may be helpful. They can compress a large number of variables into a much smaller number of independent linear combinations. These composite variables may be mapped and relationships sought between them and geological properties. As an example, R-Q mode components analysis is applied here to the Grazer Paläozoikum, a tectonic unit northeast of the city of Graz, which is composed of diverse lithologies and contains many mineral deposits.
Xu, Qinzeng; Gao, Fei; Xu, Qiang; Yang, Hongsheng
2014-11-01
Fatty acids (FAs) provide energy and also can be used to trace trophic relationships among organisms. Sea cucumber Apostichopus japonicus goes into a state of aestivation during warm summer months. We examined fatty acid profiles in aestivated and non-aestivated A. japonicus using multivariate analyses (PERMANOVA, MDS, ANOSIM, and SIMPER). The results indicate that the fatty acid profiles of aestivated and non-aestivated sea cucumbers differed significantly. The FAs that were produced by bacteria and brown kelp contributed the most to the differences in the fatty acid composition of aestivated and nonaestivated sea cucumbers. Aestivated sea cucumbers may synthesize FAs from heterotrophic bacteria during early aestivation, and long chain FAs such as eicosapentaenoic (EPA) and docosahexaenoic acid (DHA) that produced from intestinal degradation, are digested during deep aestivation. Specific changes in the fatty acid composition of A. japonicus during aestivation needs more detailed study in the future.
Deeper Insights into the Circumgalactic Medium using Multivariate Analysis Methods
Lewis, James; Churchill, Christopher W.; Nielsen, Nikole M.; Kacprzak, Glenn
2017-01-01
Drawing from a database of galaxies whose surrounding gas has absorption from MgII, called the MgII-Absorbing Galaxy Catalog (MAGIICAT, Neilsen et al 2013), we studied the circumgalactic medium (CGM) for a sample of 47 galaxies. Using multivariate analysis, in particular the k-means clustering algorithm, we determined that simultaneously examining column density (N), rest-frame B-K color, virial mass, and azimuthal angle (the projected angle between the galaxy major axis and the quasar line of sight) yields two distinct populations: (1) bluer, lower mass galaxies with higher column density along the minor axis, and (2) redder, higher mass galaxies with lower column density along the major axis. We support this grouping by running (i) two-sample, two-dimensional Kolmogorov-Smirnov (KS) tests on each of the six bivariate planes and (ii) two-sample KS tests on each of the four variables to show that the galaxies significantly cluster into two independent populations. To account for the fact that 16 of our 47 galaxies have upper limits on N, we performed Monte-Carlo tests whereby we replaced upper limits with random deviates drawn from a Schechter distribution fit, f(N). These tests strengthen the results of the KS tests. We examined the behavior of the MgII λ2796 absorption line equivalent width and velocity width for each galaxy population. We find that equivalent width and velocity width do not show similar characteristic distinctions between the two galaxy populations. We discuss the k-means clustering algorithm for optimizing the analysis of populations within datasets as opposed to using arbitrary bivariate subsample cuts. We also discuss the power of the k-means clustering algorithm in extracting deeper physical insight into the CGM in relationship to host galaxies.
Fault detection of a spur gear using vibration signal with multivariable statistical parameters
Directory of Open Access Journals (Sweden)
Songpon Klinchaeam
2014-10-01
Full Text Available This paper presents a condition monitoring technique of a spur gear fault detection using vibration signal analysis based on time domain. Vibration signals were acquired from gearboxes and used to simulate various faults on spur gear tooth. In this study, vibration signals were applied to monitor a normal and various fault conditions of a spur gear such as normal, scuffing defect, crack defect and broken tooth. The statistical parameters of vibration signal were used to compare and evaluate the value of fault condition. This technique can be applied to set alarm limit of the signal condition based on statistical parameter such as variance, kurtosis, rms and crest factor. These parameters can be used to set as a boundary decision of signal condition. From the results, the vibration signal analysis with single statistical parameter is unclear to predict fault of the spur gears. The using at least two statistical parameters can be clearly used to separate in every case of fault detection. The boundary decision of statistical parameter with the 99.7% certainty ( 3 from 300 referenced dataset and detected the testing condition with 99.7% ( 3 accuracy and had an error of less than 0.3 % using 50 testing dataset.
Khound, Nayan J.; Bhattacharyya, Krishna G.
2016-08-01
The aim of this study was to assess the quality of surfacewater sources in the Jia Bharali river basin and adjoining areas of the Himalayan foothills with respect to heavy elements viz. (As, Cd, Cr, Cu, Fe, Mn, Ni, Pb and Zn) by hydrochemical and multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA). This study presents the first ever systematic analysis on toxic elements of water samples collected from 35 different surface water sources in both the dry and wet seasons for a duration of 2 hydrological years (2009-2011). Varimax factors extracted by principal component analysis indicates anthropogenic (domestic and agricultural run-off) and geogenic influences on the trace elements. Hierarchical cluster analysis grouped 35 surfacewater sources into three statistically significant clusters based on the similarity of water quality characteristics. This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective surfacewater quality management.
Khound, Nayan J.; Bhattacharyya, Krishna G.
2017-09-01
The aim of this study was to assess the quality of surfacewater sources in the Jia Bharali river basin and adjoining areas of the Himalayan foothills with respect to heavy elements viz. (As, Cd, Cr, Cu, Fe, Mn, Ni, Pb and Zn) by hydrochemical and multivariate statistical techniques, such as cluster analysis (CA) and principal component analysis (PCA). This study presents the first ever systematic analysis on toxic elements of water samples collected from 35 different surface water sources in both the dry and wet seasons for a duration of 2 hydrological years (2009-2011). Varimax factors extracted by principal component analysis indicates anthropogenic (domestic and agricultural run-off) and geogenic influences on the trace elements. Hierarchical cluster analysis grouped 35 surfacewater sources into three statistically significant clusters based on the similarity of water quality characteristics. This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective surfacewater quality management.
Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup
2010-10-01
We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
Definition of a territorial tourist attractiveness index: a multivariate statistical approach
Directory of Open Access Journals (Sweden)
Roberto Gismondi
2007-10-01
Full Text Available Theoretical and effective tourist attractiveness should be evaluated at a very detailed territorial level. For this reason, we propose a selection of statistical variables measured at the municipality level, useful for the calculation of a tourist index on the basis of three compared statistical techniques. An empirical effort has been carried out on the 64 municipalities belonging to the Foggia province, with reference to year 2002. Finally, we have often stressed the operative usefulness of the final tourist indexes, both to correctly classify municipalities from a tourist point of view and to render easier the identification of the so called Sistemi Turistici Locali (STL.
Keita, Souleymane; Zhonghua, Tang
2017-10-01
Sustainable management of groundwater resources is a major issue for developing countries, especially in Mali. The multiple uses of groundwater led countries to promote sound management policies for sustainable use of the groundwater resources. For this reason, each country needs data enabling it to monitor and predict the changes of the resources. Also given the importance of groundwater quality changes often marked by the recurrence of droughts; the potential impacts of regional and geological setting of groundwater resources requires careful study. Unfortunately, recent decades have seen a considerable reduction of national capacities to ensure the hydrogeological monitoring and production of qualit data for decision making. The purpose of this work is to use the groundwater data and translate into useful information that can improve water resources management capacity in Mali. In this paper, we used groundwater analytical data from accredited, laboratories in Mali to carry out a national scale assessment of the groundwater types and their distribution. We, adapted multivariate statistical methods to classify 2035 groundwater samples into seven main groundwater types and built a national scale map from the results. We used a two-level K-mean clustering technique to examine the hydro-geochemical records as percentages of the total concentrations of major ions, namely sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO3), and sulphate (SO4). The first step of clustering formed 20 groups, and these groups were then re-clustered to produce the final seven groundwater types. The results were verified and confirmed using Principal Component Analysis (PCA) and RockWare (Aq.QA) software. We found that HCO3 was the most dominant anion throughout the country and that Cl and SO4 were only important in some local zones. The dominant cations were Na and Mg. Also, major ion ratios changed with geographical location and geological, and climatic
Rakotondrabe, Felaniaina; Ndam Ngoupayou, Jules Remy; Mfonka, Zakari; Rasolomanana, Eddy Harilala; Nyangono Abolo, Alexis Jacob; Ako Ako, Andrew
2018-01-01
The influence of gold mining activities on the water quality in the Mari catchment in Bétaré-Oya (East Cameroon) was assessed in this study. Sampling was performed within the period of one hydrological year (2015 to 2016), with 22 sampling sites consisting of groundwater (06) and surface water (16). In addition to measuring the physicochemical parameters, such as pH, electrical conductivity, alkalinity, turbidity, suspended solids and CN(-), eleven major elements (Na(+), K(+), Ca(2+), Mg(2+), NH4(+), Cl(-), NO3(-), HCO3(-), SO4(2-), PO4(3-) and F(-)) and eight heavy metals (Pb, Zn, Cd, Fe, Cu, As, Mn and Cr) were also analyzed using conventional hydrochemical methods, Multivariate Statistical Analysis and the Heavy metal Pollution Index (HPI). The results showed that the water from Mari catchment and Lom River was acidic to basic (5.40water quality, except for nitrates in some wells, which was found at a concentration >50mg NO3(-)/L. This water was found as two main types: calcium magnesium bicarbonate (CaMg-HCO3), which was the most represented, and sodium bicarbonate potassium (NaK-HCO3). As for trace elements in surface water, the contents of Pb, Cd, Mn, Cr and Fe were higher than recommended by the WHO guidelines, and therefore, the surface water was unsuitable for human consumption. Three phenomena were responsible for controlling the quality of the water in the study area: hydrolysis of silicate minerals of plutono-metamorphic rocks, which constitute the geological basement of this area; vegetation and soil leaching; and mining activities. The high concentrations of TSS and trace elements found in this basin were mainly due to gold mining activities (exploration and exploitation) as well as digging of rivers beds, excavation and gold amalgamation. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical methods for spatio-temporal systems
Finkenstadt, Barbel
2006-01-01
Statistical Methods for Spatio-Temporal Systems presents current statistical research issues on spatio-temporal data modeling and will promote advances in research and a greater understanding between the mechanistic and the statistical modeling communities.Contributed by leading researchers in the field, each self-contained chapter starts with an introduction of the topic and progresses to recent research results. Presenting specific examples of epidemic data of bovine tuberculosis, gastroenteric disease, and the U.K. foot-and-mouth outbreak, the first chapter uses stochastic models, such as point process models, to provide the probabilistic backbone that facilitates statistical inference from data. The next chapter discusses the critical issue of modeling random growth objects in diverse biological systems, such as bacteria colonies, tumors, and plant populations. The subsequent chapter examines data transformation tools using examples from ecology and air quality data, followed by a chapter on space-time co...
Rogachov, A; Cheng, J C; DeSouza, D D
2015-11-01
Overlapping functional magnetic resonance imaging (fMRI) activity elicited by physical pain and social rejection has posited a common neural representation between the two experiences. However, Woo and colleagues (Nat Commun 5: 5380, 2014) recently used multivariate statistics to challenge the "shared representation" theory of pain. This study has implications in the way results from fMRI studies are interpreted and has the potential of broadening our understanding of different pain states and future development of personalized medicine. Copyright © 2015 the American Physiological Society.
Institute of Scientific and Technical Information of China (English)
孔丽娅; 齐方洲; 柴可夫; 马纲; 牛永宁
2015-01-01
Objective] The use of structural equation model and clustering analysis method on early diabetic microvascular lesion regularity of TCM syndrome were studied. [Methods]The clinical investigation for 531 cases of patients with early diabetic microvascular lesions in four diagnostic information,application of structural equation model to screen common illness and disease syndrome factor ,combining clustering and frequency analysis to justify element combination of regularity and distribution of exploratory research.[Results] According to disease syndrome factor combination of structural equation model to construct,liver and spleen most close relations between the two organs,secondly,spleen kidney and liver and kidney,liver,stomach and kidney stomach; According to disease syndrome factor combination of structural equation model to construct ,Yin deficiency and real hot highest correlation,the disease is closely related to the nature; According to the clustering results of syndrome factor ,spleen wet/spleen and kidney Yang deficiency of damp and hot,hot liver and stomach qi deficiency and qi stagnation,Yin deficiency were gathered into a class; According to the results of frequency analysis,early diabetic microvascular lesions of patients with Yin deficiency(95.1%),on the basis of with kidney Yang deficiency(80.2%),and a variety of syndrome of phenomena.[Conclusion] Through the application of multivariate statistical analysis method to explore early diabetic microvascular disease syndromes law has the feasibility.%[目的]运用结构方程模型和聚类分析方法对糖尿病早期微血管病变的中医证候规律进行研究。[方法]临床调查获取531例糖尿病早期微血管病变患者的四诊信息，应用结构方程模型筛选常用的病位和病性证素，联合聚类及频数分析对证素的组合规律及分布进行探索性研究。[结果]根据病位证素组合的结构方程模型构筑，肝脾两脏关系最为密切，脾肾其次
Statistical Methods for Stochastic Differential Equations
Kessler, Mathieu; Sorensen, Michael
2012-01-01
The seventh volume in the SemStat series, Statistical Methods for Stochastic Differential Equations presents current research trends and recent developments in statistical methods for stochastic differential equations. Written to be accessible to both new students and seasoned researchers, each self-contained chapter starts with introductions to the topic at hand and builds gradually towards discussing recent research. The book covers Wiener-driven equations as well as stochastic differential equations with jumps, including continuous-time ARMA processes and COGARCH processes. It presents a sp
Comparison of multivariate methods for studying the G×E interaction
Directory of Open Access Journals (Sweden)
Deoclécio Domingos Garbuglio
2015-12-01
Full Text Available The objective of this work was to evaluate three statistical multivariate methods for analyzing adaptability and environmental stratification simultaneously, using data from maize cultivars indicated for planting in the State of Paraná-Brazil. Under the FGGE and GGE methods, the genotypic effect adjusts the G×E interactions across environments, resulting in a high percentage of explanation associated with a smaller number of axes. Environmental stratification via the FGGE and GGE methods showed similar responses, while the AMMI method did not ensure grouping of environments. The adaptability analysis revealed low divergence patterns of the responses obtained through the three methods. Genotypes P30F35, P30F53, P30R50, P30K64 and AS 1570 showed high yields associated with general adaptability. The FGGE method allowed differences in yield responses in specific regions and the impact in locations belonging to the same environmental group (through rE to be associated with the level of the simple portion of the G×E interaction.
Zhou, Ran; Peng, Shi-Tao; Qin, Xue-Bo; Shi, Hong-Hua; Ding, De-Wen
2013-03-01
A detailed field survey of hydrological, chemical and biological resources was conducted in the Bohai Bay in spring and summer 2007. The distributions of phytoplankton and their relations to environmental factors were investigated with multivariate analysis techniques. Totally 17 and 23 taxa were identified in spring and summer, respectively. The abundance of phytoplankton in spring was 115 x 10(4) cells x m(-3), which was significantly higher than that in summer (3.1 x 10(4) cells x m(-3)). Characteristics of phytoplankton assemblages in the two seasons were identified using principal component analysis (PCA), while redundancy analysis (RDA) was used to examine the environmental variables that may explain the patterns of variation of the phytoplankton community. Based on PCA results, in the spring, the phytoplankton was mainly distributed in the center and northern water zone, where the nitrate nitrogen concentration was higher. However, in summer, phytoplankton was found distributed in all zones of Bohai Bay, while the dominant species was mainly distributed in the estuary. RDA indicated that the key environmental factors that influenced phytoplankton assemblages in the spring were nitrate nitrogen (NO3(-) -N), nitrite nitrogen (NO2(-) -N) and soluble reactive phosphorus (SRP), while ammonium nitrogen (NH4(+) -N) and water temperature (WT) played key roles in summer.
Directory of Open Access Journals (Sweden)
Gledsneli Maria Lima Lins
2010-12-01
Full Text Available Water has a decisive influence on populations’ life quality – specifically in areas like urban supply, drainage, and effluents treatment – due to its sound impact over public health. Water rational use constitutes the greatest challenge faced by water demand management, mainly with regard to urban household water consumption. This makes it important to develop researches to assist water managers and public policy-makers in planning and formulating water demand measures which may allow urban water rational use to be met. This work utilized the multivariate techniques Factor Analysis and Multiple Linear Regression Analysis – in order to determine the participation level of socioeconomic and climatic variables in monthly urban household consumption changes – applying them to two districts of Campina Grande city (State of Paraíba, Brazil. The districts were chosen based on socioeconomic criterion (income level so as to evaluate their water consumer’s behavior. A 9-year monthly data series (from year 2000 up to 2008 was utilized, comprising family income, water tariff, and quantity of household connections (economies – as socioeconomic variables – and average temperature and precipitation, as climatic variables. For both the selected districts of Campina Grande city, the obtained results point out the variables “water tariff” and “family income” as indicators of these district’s household consumption.
Bakraji, E. H.; Rihawy, M. S.; Castel, C.; Abboud, R.
2015-03-01
Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year.
Simple method of designing centralized PI controllers for multivariable systems based on SSGM.
Dhanya Ram, V; Chidambaram, M
2015-05-01
A method is given to design multivariable PI/PID controllers for stable and unstable multivariable systems. The method needs only the steady state gain matrix (SSGM). The method is based on the static decoupler design followed by SISO PI/PID controllers design and combining the resulted decoupler and the diagonal PI(D) controllers as the centralized controllers. The result of the present method is shown to be equivalent to the empirical method proposed by Davison EJ. Multivariable tuning regulators: the feed-forward and robust control of general servo-mechanism problem. IEEE Trans Autom Control 1976;21:35-41. Three simulation examples are given. The performance of the controllers is compared with that of the reported centralized controller based on the multivariable transfer function matrix.
Applying statistical methods to text steganography
Nechta, Ivan
2011-01-01
This paper presents a survey of text steganography methods used for hid- ing secret information inside some covertext. Widely known hiding techniques (such as translation based steganography, text generating and syntactic embed- ding) and detection are considered. It is shown that statistical analysis has an important role in text steganalysis.
Statistical search methods for lotsizing problems
M. Salomon (Marc); R. Kuik (Roelof); L.N. van Wassenhove (Luk)
1993-01-01
textabstractThis paper reports on our experiments with statistical search methods for solving lotsizing problems in production planning. In lotsizing problems the main objective is to generate a minimum cost production and inventory schedule, such that (i) customer demand is satisfied, and (ii) capa
Energy Technology Data Exchange (ETDEWEB)
None, None
2012-12-31
This report evaluates the chemistry of seep water occurring in three desert drainages near Shiprock, New Mexico: Many Devils Wash, Salt Creek Wash, and Eagle Nest Arroyo. Through the use of geochemical plotting tools and multivariate statistical analysis techniques, analytical results of samples collected from the three drainages are compared with the groundwater chemistry at a former uranium mill in the Shiprock area (the Shiprock site), managed by the U.S. Department of Energy Office of Legacy Management. The objective of this study was to determine, based on the water chemistry of the samples, if statistically significant patterns or groupings are apparent between the sample populations and, if so, whether there are any reasonable explanations for those groupings.
Dhat, Shalaka; Pund, Swati; Kokare, Chandrakant; Sharma, Pankaj; Shrivastava, Birendra
2017-01-01
Rapidly evolving technical and regulatory landscapes of the pharmaceutical product development necessitates risk management with application of multivariate analysis using Process Analytical Technology (PAT) and Quality by Design (QbD). Poorly soluble, high dose drug, Satranidazole was optimally nanoprecipitated (SAT-NP) employing principles of Formulation by Design (FbD). The potential risk factors influencing the critical quality attributes (CQA) of SAT-NP were identified using Ishikawa diagram. Plackett-Burman screening design was adopted to screen the eight critical formulation and process parameters influencing the mean particle size, zeta potential and dissolution efficiency at 30min in pH7.4 dissolution medium. Pareto charts (individual and cumulative) revealed three most critical factors influencing CQA of SAT-NP viz. aqueous stabilizer (Polyvinyl alcohol), release modifier (Eudragit® S 100) and volume of aqueous phase. The levels of these three critical formulation attributes were optimized by FbD within established design space to minimize mean particle size, poly dispersity index, and maximize encapsulation efficiency of SAT-NP. Lenth's and Bayesian analysis along with mathematical modeling of results allowed identification and quantification of critical formulation attributes significantly active on the selected CQAs. The optimized SAT-NP exhibited mean particle size; 216nm, polydispersity index; 0.250, zeta potential; -3.75mV and encapsulation efficiency; 78.3%. The product was lyophilized using mannitol to form readily redispersible powder. X-ray diffraction analysis confirmed the conversion of crystalline SAT to amorphous form. In vitro release of SAT-NP in gradually pH changing media showed 95%) in pH7.4 in next 3h, indicative of burst release after a lag time. This investigation demonstrated effective application of risk management and QbD tools in developing site-specific release SAT-NP by nanoprecipitation.
Ebrahimi, Milad; Gerber, Erin L; Rockaway, Thomas D
2017-05-15
For most water treatment plants, a significant number of performance data variables are attained on a time series basis. Due to the interconnectedness of the variables, it is often difficult to assess over-arching trends and quantify operational performance. The objective of this study was to establish simple and reliable predictive models to correlate target variables with specific measured parameters. This study presents a multivariate analysis of the physicochemical parameters of municipal wastewater. Fifteen quality and quantity parameters were analyzed using data recorded from 2010 to 2016. To determine the overall quality condition of raw and treated wastewater, a Wastewater Quality Index (WWQI) was developed. The index summarizes a large amount of measured quality parameters into a single water quality term by considering pre-established quality limitation standards. To identify treatment process performance, the interdependencies between the variables were determined by using Principal Component Analysis (PCA). The five extracted components from the 15 variables accounted for 75.25% of total dataset information and adequately represented the organic, nutrient, oxygen demanding, and ion activity loadings of influent and effluent streams. The study also utilized the model to predict quality parameters such as Biological Oxygen Demand (BOD), Total Phosphorus (TP), and WWQI. High accuracies ranging from 71% to 97% were achieved for fitting the models with the training dataset and relative prediction percentage errors less than 9% were achieved for the testing dataset. The presented techniques and procedures in this paper provide an assessment framework for the wastewater treatment monitoring programs. Copyright © 2017 Elsevier Ltd. All rights reserved.
A comparison of multivariate genome-wide association methods
DEFF Research Database (Denmark)
Galesloot, Tessel E; Van Steen, Kristel; Kiemeney, Lambertus A L M;
2014-01-01
methods that are implemented in the software packages PLINK, SNPTEST, MultiPhen, BIMBAM, PCHAT and TATES, and also compared them to standard univariate GWAS, analysis of the first principal component of the traits, and meta-analysis of univariate results. We simulated data (N = 1000) for three...
Mfumu Kihumba, Antoine; Vanclooster, Marnik
2013-04-01
Drinking water in Kinshasa, the capital of the Democratic Republic of Congo, is provided by extracting groundwater from the local aquifer, particularly in peripheral areas. The exploited groundwater body is mainly unconfined and located within a continuous detrital aquifer, primarily composed of sedimentary formations. However, the aquifer is subjected to an increasing threat of anthropogenic pollution pressure. Understanding the detailed origin of this pollution pressure is important for sustainable drinking water management in Kinshasa. The present study aims to explain the observed nitrate pollution problem, nitrate being considered as a good tracer for other pollution threats. The analysis is made in terms of physical attributes that are readily available using a statistical modelling approach. For the nitrate data, use was made of a historical groundwater quality assessment study, for which the data were re-analysed. The physical attributes are related to the topography, land use, geology and hydrogeology of the region. Prior to the statistical modelling, intrinsic and specific vulnerability for nitrate pollution was assessed. This vulnerability assessment showed that the alluvium area in the northern part of the region is the most vulnerable area. This area consists of urban land use with poor sanitation. Re-analysis of the nitrate pollution data demonstrated that the spatial variability of nitrate concentrations in the groundwater body is high, and coherent with the fragmented land use of the region and the intrinsic and specific vulnerability maps. For the statistical modeling use was made of multiple regression and regression tree analysis. The results demonstrated the significant impact of land use variables on the Kinshasa groundwater nitrate pollution and the need for a detailed delineation of groundwater capture zones around the monitoring stations. Key words: Groundwater , Isotopic, Kinshasa, Modelling, Pollution, Physico-chemical.
Bevacqua, Emanuele; Maraun, Douglas; Hobæk Haff, Ingrid; Widmann, Martin; Vrac, Mathieu
2017-06-01
Compound events (CEs) are multivariate extreme events in which the individual contributing variables may not be extreme themselves, but their joint - dependent - occurrence causes an extreme impact. Conventional univariate statistical analysis cannot give accurate information regarding the multivariate nature of these events. We develop a conceptual model, implemented via pair-copula constructions, which allows for the quantification of the risk associated with compound events in present-day and future climate, as well as the uncertainty estimates around such risk. The model includes predictors, which could represent for instance meteorological processes that provide insight into both the involved physical mechanisms and the temporal variability of compound events. Moreover, this model enables multivariate statistical downscaling of compound events. Downscaling is required to extend the compound events' risk assessment to the past or future climate, where climate models either do not simulate realistic values of the local variables driving the events or do not simulate them at all. Based on the developed model, we study compound floods, i.e. joint storm surge and high river runoff, in Ravenna (Italy). To explicitly quantify the risk, we define the impact of compound floods as a function of sea and river levels. We use meteorological predictors to extend the analysis to the past, and get a more robust risk analysis. We quantify the uncertainties of the risk analysis, observing that they are very large due to the shortness of the available data, though this may also be the case in other studies where they have not been estimated. Ignoring the dependence between sea and river levels would result in an underestimation of risk; in particular, the expected return period of the highest compound flood observed increases from about 20 to 32 years when switching from the dependent to the independent case.
Directory of Open Access Journals (Sweden)
E. Bevacqua
2017-06-01
Full Text Available Compound events (CEs are multivariate extreme events in which the individual contributing variables may not be extreme themselves, but their joint – dependent – occurrence causes an extreme impact. Conventional univariate statistical analysis cannot give accurate information regarding the multivariate nature of these events. We develop a conceptual model, implemented via pair-copula constructions, which allows for the quantification of the risk associated with compound events in present-day and future climate, as well as the uncertainty estimates around such risk. The model includes predictors, which could represent for instance meteorological processes that provide insight into both the involved physical mechanisms and the temporal variability of compound events. Moreover, this model enables multivariate statistical downscaling of compound events. Downscaling is required to extend the compound events' risk assessment to the past or future climate, where climate models either do not simulate realistic values of the local variables driving the events or do not simulate them at all. Based on the developed model, we study compound floods, i.e. joint storm surge and high river runoff, in Ravenna (Italy. To explicitly quantify the risk, we define the impact of compound floods as a function of sea and river levels. We use meteorological predictors to extend the analysis to the past, and get a more robust risk analysis. We quantify the uncertainties of the risk analysis, observing that they are very large due to the shortness of the available data, though this may also be the case in other studies where they have not been estimated. Ignoring the dependence between sea and river levels would result in an underestimation of risk; in particular, the expected return period of the highest compound flood observed increases from about 20 to 32 years when switching from the dependent to the independent case.
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
Institute of Scientific and Technical Information of China (English)
Nur Hazirah Adnan; Mohamad Pauzi Zakaria; Hafizan Juahir; Masni Mohd Ali
2012-01-01
The Langat River in Malaysia has been experiencing anthropogenic input from urban,rural and industrial activities for many years.Sewage contamination,possibly originating from the greater than three million inhabitants of the Langat River Basin,were examined.Sediment samples from 22 stations (SL01-SL22) along the Langat River were collected,extracted and analysed by GC-MS.Six different sterols were identified and quantified.The highest sterol concentration was found at station SL02 (618.29 ng/g dry weight),which situated in the Balak River whereas the other sediment samples ranged between 11.60 and 446.52 ng/g dry weight.Sterol ratios were used to identify sources,occurrence and partitioning of faecal matter in sediments and majority of the ratios clearly demonstrated that sewage contamination was occurring at most stations in the Langat River.A multivariate statistical analysis was used in conjunction with a combination of biomarkers to better understand the data that clearly separated the compounds.Most sediments of the Langat River were found to contain low to mid-range sewage contamination with some containing ‘significant' levels of contamination.This is the first report on sewage pollution in the Langat River based on a combination of biomarker and multivariate statistical approaches that will establish a new standard for sewage detection using faecal sterols.
Directory of Open Access Journals (Sweden)
Farooq Ahmad
2011-12-01
Full Text Available Multivariate statistical techniques such as factor analysis (FA, cluster analysis (CA and discriminant analysis (DA, were applied for the evaluation of spatial variations and the interpretation of a large complex water quality data set of three cities (Lahore, Gujranwala and Sialkot in Punjab, Pakistan. 16 parameters of water samples collected from nine different sampling stations of each city were determined. Factor analysis indicates five factors, which explained 74% of the total variance in water quality data set. Five factors are salinization, alkalinity, temperature, domestic waste and chloride, which explained 31.1%, 14.3%, 10.6%, 10.0% and 8.0% of the total variance respectively. Hierarchical cluster analysis grouped nine sampling stations of each city into three clusters, i.e., relatively less polluted (LP, and moderately polluted (MP and highly polluted (HP sites, based on the similarity of water quality characteristics. Discriminant analysis (DA identified ten significant parameters (Calcium (Ca, Ammonia, Sulphate, Sodium (Na, electrical conductivity (EC, chloride, temperature (Temp, total hardness(TH, Turbidity, which discriminate the groundwater quality of three cities, with close to 100.0% correct assignment for spatial variations. This study illustrates the benefit of multivariate statistical techniques for interpreting complex data sets in the analysis of spatial variations in water quality, and to plan for future studies.
Reidy, Lorlyn; Bu, Kaixuan; Godfrey, Murrell; Cizdziel, James V
2013-12-10
Students in an instrumental analysis course with a forensic emphasis were presented with a mock scenario in which soil was collected from a murder suspect's car mat, from the crime scene, from adjacent areas, and from more distant locations. Students were then asked to conduct a comparative analysis using the soil's elemental distribution fingerprints. The soil was collected from Lafayette County, Mississippi, USA and categorized as sandy loam. Eight student groups determined twenty-two elements (Li, Be, Mg, Al, K, Ca, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Rb, Sr, Cs, Ba, Pb, U) in seven samples of soil and one sample of sediment by microwave-assisted acid digestion and inductively coupled plasma-mass spectrometry (ICP-MS). Data were combined and evaluated using multivariate statistical analyses. All eight student groups correctly classified their unknown among the different locations. Students learn, however, that whereas their results suggest that the elemental fingerprinting approach can be used to distinguish soils from different land-use areas and geographic locations, applying the methodology in forensic investigations is more complicated and has potential pitfalls. Overall, the inquiry-based pedagogy enthused the students and provided learning opportunities in analytical chemistry, including sample preparation, ICP-MS, figures-of-merit, and multivariate statistics. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Chen-Lin Soo
2017-01-01
Full Text Available The study on Sarawak coastal water quality is scarce, not to mention the application of the multivariate statistical approach to investigate the spatial variation of water quality and to identify the pollution source in Sarawak coastal water. Hence, the present study aimed to evaluate the spatial variation of water quality along the coastline of the southwestern region of Sarawak using multivariate statistical techniques. Seventeen physicochemical parameters were measured at 11 stations along the coastline with approximately 225 km length. The coastal water quality showed spatial heterogeneity where the cluster analysis grouped the 11 stations into four different clusters. Deterioration in coastal water quality has been observed in different regions of Sarawak corresponding to land use patterns in the region. Nevertheless, nitrate-nitrogen exceeded the guideline value at all sampling stations along the coastline. The principal component analysis (PCA has determined a reduced number of five principal components that explained 89.0% of the data set variance. The first PC indicated that the nutrients were the dominant polluting factors, which is attributed to the domestic, agricultural, and aquaculture activities, followed by the suspended solids in the second PC which are related to the logging activities.
Multi-block methods in multivariate process control
DEFF Research Database (Denmark)
Kohonen, J.; Reinikainen, S.P.; Aaljoki, K.
2008-01-01
In chemometric studies all predictor variables are usually collected in one data matrix X. This matrix is then analyzed by PLS regression or other methods. When data from several different sub-processes are collected in one matrix, there is a possibility that the effects of some sub-processes may...... vanish. If there is, for instance, mechanic data from one process and spectral data from another, the influence of the mechanic sub-process may not be detected. An application of multi-block (MB) methods, where the X-data are divided into several data blocks is presented in this study. By using MB...... methods the effect of a sub-process can be seen and an example with two blocks, near infra-red, NIR, and process data, is shown. The results show improvements in modelling task, when a MB-based approach is used. This way of working with data gives more information on the process than if all data...
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a hallucinogenic herb that is internationally regulated. In this study, salvinorin A, the active compound in S. divinorum, was extracted from S. divinorum plant leaves using a 5-min extraction with dichloromethane. Four additional Salvia species (Salvia officinalis, Salvia guaranitica, Salvia splendens, and Salvia nemorosa) were extracted using this procedure, and all extracts were analyzed by gas chromatography-mass spectrometry. Differentiation of S. divinorum from other Salvia species was successful based on visual assessment of the resulting chromatograms. To provide a more objective comparison, the total ion chromatograms (TICs) were subjected to principal components analysis (PCA). Prior to PCA, the TICs were subjected to a series of data pretreatment procedures to minimize non-chemical sources of variance in the data set. Successful discrimination of S. divinorum from the other four Salvia species was possible based on visual assessment of the PCA scores plot. To provide a numerical assessment of the discrimination, a series of statistical procedures such as Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores. The statistical procedures were then compared to determine the advantages and disadvantages for forensic applications.
Gaonkar, Bilwaj; Davatzikos, Christos
2013-01-01
Multivariate pattern analysis (MVPA) methods such as support vector machines (SVMs) have been increasingly applied to fMRI and sMRI analyses, enabling the detection of distinctive imaging patterns. However, identifying brain regions that significantly contribute to the classification/group separation requires computationally expensive permutation testing. In this paper we show that the results of SVM-permutation testing can be analytically approximated. This approximation leads to more than a...
Evolutionary Computation Methods and their applications in Statistics
Directory of Open Access Journals (Sweden)
Francesco Battaglia
2013-05-01
Full Text Available A brief discussion of the genesis of evolutionary computation methods, their relationship to artificial intelligence, and the contribution of genetics and Darwin’s theory of natural evolution is provided. Then, the main evolutionary computation methods are illustrated: evolution strategies, genetic algorithms, estimation of distribution algorithms, differential evolution, and a brief description of some evolutionary behavior methods such as ant colony and particle swarm optimization. We also discuss the role of the genetic algorithm for multivariate probability distribution random generation, rather than as a function optimizer. Finally, some relevant applications of genetic algorithm to statistical problems are reviewed: selection of variables in regression, time series model building, outlier identification, cluster analysis, design of experiments.
Multivariate error assessment of response time histories method for dynamic systems
Institute of Scientific and Technical Information of China (English)
Zhen-fei ZHAN; Jie HU; Yan FU; Ren-Jye YANG; Ying-hong PENG; Jin QI
2012-01-01
In this paper,an integrated validation method and process are developed for multivariate dynamic systems.The principal component analysis approach is used to address multivariate correlation and dimensionality reduction,the dynamic time warping and correlation coefficient are used for error assessment,and the subject matter experts (SMEs)' opinions and principal component analysis coefficients are incorporated to provide the overall rating of the dynamic system.The proposed method and process are successfully demonstrated through a vehicle dynamic system problem.
Fernández-González, Daniel; Martín-Duarte, Ramón; Ruiz-Bustinza, Íñigo; Mochón, Javier; González-Gasca, Carmen; Verdeja, Luis Felipe
2016-08-01
Blast furnace operators expect to get sinter with homogenous and regular properties (chemical and mechanical), necessary to ensure regular blast furnace operation. Blends for sintering also include several iron by-products and other wastes that are obtained in different processes inside the steelworks. Due to their source, the availability of such materials is not always consistent, but their total production should be consumed in the sintering process, to both save money and recycle wastes. The main scope of this paper is to obtain the least expensive iron ore blend for the sintering process, which will provide suitable chemical and mechanical features for the homogeneous and regular operation of the blast furnace. The systematic use of statistical tools was employed to analyze historical data, including linear and partial correlations applied to the data and fuzzy clustering based on the Sugeno Fuzzy Inference System to establish relationships among the available variables.
Bootstrap-based confidence estimation in PCA and multivariate statistical process control
DEFF Research Database (Denmark)
Babamoradi, Hamid
Traditional/Asymptotic confidence estimation has limited applicability since it needs statistical theories to estimate the confidences, which are not available for all indicators/parameters. Furthermore, in case the theories are available for a specific indicator/parameter, the theories are based...... on assumptions that do not always hold in practice. The aim of this thesis was to illustrate the concept of bootstrap-based confidence estimation in PCA and MSPC. It particularly shows how to build bootstrapbased confidence limits in these areas to be used as alternative to the traditional/asymptotic limits....... The goal was to improve process monitoring by improving the quality of MSPC charts and contribution plots. Bootstrapping algorithm to build confidence limits was illustrated in a case study format (Paper I). The main steps in the algorithm were discussed where a set of sensible choices (plus...
Energy Technology Data Exchange (ETDEWEB)
Park, Jinyong [Univ. of Arizona, Tucson, AZ (United States); Balasingham, P [Univ. of Arizona, Tucson, AZ (United States); McKenna, Sean Andrew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kulatilake, Pinnaduwa H.S.W. [Univ. of Arizona, Tucson, AZ (United States)
2004-09-01
Sandia National Laboratories, under contract to Nuclear Waste Management Organization of Japan (NUMO), is performing research on regional classification of given sites in Japan with respect to potential volcanic disruption using multivariate statistics and geo-statistical interpolation techniques. This report provides results obtained for hierarchical probabilistic regionalization of volcanism for the Sengan region in Japan by applying multivariate statistical techniques and geostatistical interpolation techniques on the geologic data provided by NUMO. A workshop report produced in September 2003 by Sandia National Laboratories (Arnold et al., 2003) on volcanism lists a set of most important geologic variables as well as some secondary information related to volcanism. Geologic data extracted for the Sengan region in Japan from the data provided by NUMO revealed that data are not available at the same locations for all the important geologic variables. In other words, the geologic variable vectors were found to be incomplete spatially. However, it is necessary to have complete geologic variable vectors to perform multivariate statistical analyses. As a first step towards constructing complete geologic variable vectors, the Universal Transverse Mercator (UTM) zone 54 projected coordinate system and a 1 km square regular grid system were selected. The data available for each geologic variable on a geographic coordinate system were transferred to the aforementioned grid system. Also the recorded data on volcanic activity for Sengan region were produced on the same grid system. Each geologic variable map was compared with the recorded volcanic activity map to determine the geologic variables that are most important for volcanism. In the regionalized classification procedure, this step is known as the variable selection step. The following variables were determined as most important for volcanism: geothermal gradient, groundwater temperature, heat discharge, groundwater
Directory of Open Access Journals (Sweden)
Seca Gandaseca
2014-01-01
Full Text Available This study reports the spatio-temporal changes in river and canal water quality of peat swamp forest and oil palm plantation sites of Sarawak, Malaysia. To investigate temporal changes, 192 water samples were collected at four stations of BatangIgan, an oil palm plantation site of Sarawak, during July-November in 2009 and April-July in 2010. Nine water quality parameters including Electrical Conductivity (EC, pH, Turbidity (TER, Dissolved Oxygen (DO, Temperature (TEMP, Chemical Oxygen Demand (COD, five-day Biochemical Oxygen Demand (BOD_{5}, ammonia-Nitrogen (NH_{3}-N, Total Suspended Solids (TSS were analysed. To investigate spatial changes, 432water samples were collected from six different sites including BatangIgan during June-August 2010. Six water quality parameters including pH, DO, COD, BOD_{5}, NH_{3}-N and TSS were analysed to see the spatial variations. Most significant parameters which contributed in spatio-temporal variations were assessed by statistical techniques such as Hierarchical Agglomerative Cluster Analysis (HACA, Factor Analysis/Principal Components Analysis (FA/PCA and Discriminant Function Analysis (DFA. HACA identified three different classes of sites: Relatively Unimpaired, Impaired and Less Impaired Regions on the basis of similarity among different physicochemical characteristics and pollutant level between the sampling sites. DFA produced the best results for identification of main variables for temporal analysis and separated parameters (EC, TER, COD and identified three parameters for spatial analysis (pH, NH_{3}-N and BOD_{5}. The results signify that parameters identified by statistical analyses were responsible for water quality change and suggest the possibility the agricultural and oil palm plantation activities as a source of pollutants. The results suggest dire need for proper watershed management measures to restore the water quality of this tributary for a
Energy Technology Data Exchange (ETDEWEB)
Wallace, D L; Perlman, M D
1980-06-01
This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.
Institute of Scientific and Technical Information of China (English)
SUN Wei; CHANG Hong; REN Zhan-jun; YANG Zhang-ping; GENG Rong-qing; LU Sheng-xia; DU Lei; Tsunoda Kenji
2003-01-01
The sheep populations were studied with Q-type Hierarchical Clustering Method, and characters of the populations were used to construct the principal component, then, the principal component values were analyzed with R-type Hierarchical Clustering Method, which might display the genetic differentiation among populations and conform to the result of the known sheep phylogenetic system in China. Characters of the populations were studied with Q-type Hierarchical Clustering Method. The elevation and average annual rainfall were found to be important characters. The ecology factor is also an important character for the breed classification.
Multivariate drought frequency estimation using copula method in Southwest China
Hao, Cui; Zhang, Jiahua; Yao, Fengmei
2015-12-01
Drought over Southwest China occurs frequently and has an obvious seasonal characteristic. Proper management of regional droughts requires knowledge of the expected frequency or probability of specific climate information. This study utilized k-means classification and copulas to demonstrate the regional drought occurrence probability and return period based on trivariate drought properties, i.e., drought duration, severity, and peak. A drought event in this study was defined when 3-month Standardized Precipitation Evapotranspiration Index (SPEI) was less than -0.99 according to the regional climate characteristic. Then, the next step was to classify the region into six clusters by k-means method based on annual and seasonal precipitation and temperature and to establish marginal probabilistic distributions for each drought property in each sub-region. Several copula types were selected to test the best fit distribution, and Student t copula was recognized as the best one to integrate drought duration, severity, and peak. The results indicated that a proper classification was important for a regional drought frequency analysis, and copulas were useful tools in exploring the associations of the correlated drought variables and analyzing drought frequency. Student t copula was a robust and proper function for drought joint probability and return period analysis, which is important for analyzing and predicting the regional drought risks.
Multivariate drought frequency estimation using copula method in Southwest China
Hao, Cui; Zhang, Jiahua; Yao, Fengmei
2017-02-01
Drought over Southwest China occurs frequently and has an obvious seasonal characteristic. Proper management of regional droughts requires knowledge of the expected frequency or probability of specific climate information. This study utilized k-means classification and copulas to demonstrate the regional drought occurrence probability and return period based on trivariate drought properties, i.e., drought duration, severity, and peak. A drought event in this study was defined when 3-month Standardized Precipitation Evapotranspiration Index (SPEI) was less than -0.99 according to the regional climate characteristic. Then, the next step was to classify the region into six clusters by k-means method based on annual and seasonal precipitation and temperature and to establish marginal probabilistic distributions for each drought property in each sub-region. Several copula types were selected to test the best fit distribution, and Student t copula was recognized as the best one to integrate drought duration, severity, and peak. The results indicated that a proper classification was important for a regional drought frequency analysis, and copulas were useful tools in exploring the associations of the correlated drought variables and analyzing drought frequency. Student t copula was a robust and proper function for drought joint probability and return period analysis, which is important for analyzing and predicting the regional drought risks.
Méndez, Jesús; González, Mónica; Lobo, M Gloria; Carnero, Aurelio
2004-03-10
The commercial value of a cochineal (Dactylopius coccus Costa) sample is associated with its color quality. Because the cochineal is a legal food colorant, its color quality is generally understood as its pigment content. Simply put, the higher this content, the more valuable the sample is to the market. In an effort to devise a way to measure the color quality of a cochineal, the present study evaluates different parameters of color measurement such as chromatic attributes (L*, and a*), percentage of carminic acid, tint determination, and chromatographic profile of pigments. Tint determination did not achieve this objective because this parameter does not correlate with carminic acid content. On the other hand, carminic acid showed a highly significant correlation (r = - 0.922, p = 0.000) with L* values determined from powdered cochineal samples. The combination of the information from the spectrophotometric determination of carminic acid with that of the pigment profile acquired by liquid chromatography (LC) and the composition of the red and yellow pigment groups, also acquired by LC, enables greater accuracy in judging the quality of the final sample. As a result of this study, it was possible to achieve the separation of cochineal samples according to geographical origin using two statistical techniques: cluster analysis and principal component analysis.
Institute of Scientific and Technical Information of China (English)
杨茜; 梁颖华; 陈银京; 解忠诚
2012-01-01
Preventive assets Refers to the addition to the normal living expenses,Residents accumulate additional assets in order to ensure the future of consumption.The preventive assets divided into two categories.The first category is financial assets,such as stocks,treasury bills,saving insurance,etc.The other category is non-financial assets,such as health,education and real estate,etc.In this paper,we use multivariate statistical analysis to study the residents of preventive non-financial assets.%预防性资产是指除了正常的生活消费之外,居民积累额外的资产以保证未来的消费。预防性资产主要分为两类：第一类是金融类资产,例如股票、国库券、储蓄性保险等金融投资等等。另一类是金融性资产,例如健康、教育和房地产等等。在本文中,我们使用多元统计分析方法来研究居民预防性非金融资产。
Brandmeier, M.; Wörner, G.
2016-10-01
Multivariate statistical and geospatial analyses based on a compilation of 890 geochemical and 1200 geochronological data for 194 mapped ignimbrites from the Central Andes document the compositional and temporal patterns of large-volume ignimbrites (so-called "ignimbrite flare-ups") during Neogene times. Rapid advances in computational science during the past decade led to a growing pool of algorithms for multivariate statistics for large datasets with many predictor variables. This study applies cluster analysis (CA) and linear discriminant analysis (LDA) on log-ratio transformed data with the aim of (1) testing a tool for ignimbrite correlation and (2) distinguishing compositional groups that reflect different processes and sources of ignimbrite magmatism during the geodynamic evolution of the Central Andes. CA on major and trace elements allows grouping of ignimbrites according to their geochemical characteristics into rhyolitic and dacitic "end-members" and to differentiate characteristic trace element signatures with respect to Eu anomaly, depletions in middle and heavy rare earth elements (REE) and variable enrichments in light REE. To highlight these distinct compositional signatures, we applied LDA to selected ignimbrites for which comprehensive datasets were available. In comparison to traditional geochemical parameters we found that the advantage of multivariate statistics is their capability of dealing with large datasets and many variables (elements) and to take advantage of this n-dimensional space to detect subtle compositional differences contained in the data. The most important predictors for discriminating ignimbrites are La, Yb, Eu, Al2O3, K2O, P2O5, MgO, FeOt, and TiO2. However, other REE such as Gd, Pr, Tm, Sm, Dy and Er also contribute to the discrimination functions. Significant compositional differences were found between (1) the older (> 13 Ma) large-volume plateau-forming ignimbrites in northernmost Chile and southern Peru and (2) the
Energy Technology Data Exchange (ETDEWEB)
Chen, Hao [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Lu, Xinwei, E-mail: luxinwei@snnu.edu.cn [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China); Li, Loretta Y., E-mail: lli@civil.ubc.ca [Department of Civil Engineering, University of British Columbia, Vancouver V6T 1Z4 (Canada); Gao, Tianning; Chang, Yuyu [School of Tourism and Environment, Shaanxi Normal University, Xi' an 710062 (China)
2014-06-01
The concentrations of As, Ba, Co, Cr, Cu, Mn, Ni, Pb, V and Zn in campus dust from kindergartens, elementary schools, middle schools and universities of Xi'an, China were determined by X-ray fluorescence spectrometry. Correlation coefficient analysis, principal component analysis (PCA) and cluster analysis (CA) were used to analyze the data and to identify possible sources of these metals in the dust. The spatial distributions of metals in urban dust of Xi'an were analyzed based on the metal concentrations in campus dusts using the geostatistics method. The results indicate that dust samples from campuses have elevated metal concentrations, especially for Pb, Zn, Co, Cu, Cr and Ba, with the mean values of 7.1, 5.6, 3.7, 2.9, 2.5 and 1.9 times the background values for Shaanxi soil, respectively. The enrichment factor results indicate that Mn, Ni, V, As and Ba in the campus dust were deficiently to minimally enriched, mainly affected by nature and partly by anthropogenic sources, while Co, Cr, Cu, Pb and Zn in the campus dust and especially Pb and Zn were mostly affected by human activities. As and Cu, Mn and Ni, Ba and V, and Pb and Zn had similar distribution patterns. The southwest high-tech industrial area and south commercial and residential areas have relatively high levels of most metals. Three main sources were identified based on correlation coefficient analysis, PCA, CA, as well as spatial distribution characteristics. As, Ni, Cu, Mn, Pb, Zn and Cr have mixed sources — nature, traffic, as well as fossil fuel combustion and weathering of materials. Ba and V are mainly derived from nature, but partly also from industrial emissions, as well as construction sources, while Co principally originates from construction. - Highlights: • Metal content in dust from schools was determined by XRF. • Spatial distribution of metals in urban dust was focused on campus samples. • Multivariate statistic and spatial distribution were used to identify metal
Liu, Ya-Juan; André, Silvère; Saint Cristau, Lydia; Lagresle, Sylvain; Hannas, Zahia; Calvosa, Éric; Devos, Olivier; Duponchel, Ludovic
2017-02-01
Multivariate statistical process control (MSPC) is increasingly popular as the challenge provided by large multivariate datasets from analytical instruments such as Raman spectroscopy for the monitoring of complex cell cultures in the biopharmaceutical industry. However, Raman spectroscopy for in-line monitoring often produces unsynchronized data sets, resulting in time-varying batches. Moreover, unsynchronized data sets are common for cell culture monitoring because spectroscopic measurements are generally recorded in an alternate way, with more than one optical probe parallelly connecting to the same spectrometer. Synchronized batches are prerequisite for the application of multivariate analysis such as multi-way principal component analysis (MPCA) for the MSPC monitoring. Correlation optimized warping (COW) is a popular method for data alignment with satisfactory performance; however, it has never been applied to synchronize acquisition time of spectroscopic datasets in MSPC application before. In this paper we propose, for the first time, to use the method of COW to synchronize batches with varying durations analyzed with Raman spectroscopy. In a second step, we developed MPCA models at different time intervals based on the normal operation condition (NOC) batches synchronized by COW. New batches are finally projected considering the corresponding MPCA model. We monitored the evolution of the batches using two multivariate control charts based on Hotelling's T(2) and Q. As illustrated with results, the MSPC model was able to identify abnormal operation condition including contaminated batches which is of prime importance in cell culture monitoring We proved that Raman-based MSPC monitoring can be used to diagnose batches deviating from the normal condition, with higher efficacy than traditional diagnosis, which would save time and money in the biopharmaceutical industry. Copyright © 2016 Elsevier B.V. All rights reserved.
Gaonkar, Bilwaj; Davatzikos, Christos
2013-09-01
Multivariate pattern analysis (MVPA) methods such as support vector machines (SVMs) have been increasingly applied to fMRI and sMRI analyses, enabling the detection of distinctive imaging patterns. However, identifying brain regions that significantly contribute to the classification/group separation requires computationally expensive permutation testing. In this paper we show that the results of SVM-permutation testing can be analytically approximated. This approximation leads to more than a thousandfold speedup of the permutation testing procedure, thereby rendering it feasible to perform such tests on standard computers. The speedup achieved makes SVM based group difference analysis competitive with standard univariate group difference analysis methods.
Directory of Open Access Journals (Sweden)
Guillermo A. Riveros
2014-01-01
Full Text Available Combined effects of several complex phenomena cause the deterioration of elements of steel hydraulic structures on the nation’s lock systems: loss of protective systems, corrosion, cracking and fatigue, impacts, and overloads. This paper presents examples of deterioration of steel hydraulic structures. A method for predicting future deterioration based on current conditions is also presented. This paper also includes a procedure for developing deterioration curves when condition state data is available.
Source Apportionment of Heavy Metals in Soils Using Multivariate Statistics and Geostatistics
Institute of Scientific and Technical Information of China (English)
QU Ming-Kai; LI Wei-Dong; ZHANG Chuan-Rong; WANG Shan-Qin; YANG Yong; HE Li-Yuan
2013-01-01
The main objectives of this study were to introduce an integrated method for effectively identifying soil heavy metal pollution sources and apportioning their contributions,and apply it to a case study.The method combines the principal component analysis/absolute principal component scores (PCA/APCS) receptor model and geostatistics.The case study was conducted in an area of 31 km2 in the urban-rural transition zone of Wuhan,a metropolis of central China.124 topsoil samples were collected for measuring the concentrations of eight heavy metal elements (Mn,Cu,Zn,Pb,Cd,Cr,Ni and Co).PCA results revealed that three major factors were responsible for soil heavy metal pollution,which were initially identified as "steel production","agronomic input" and "coal consumption".The APCS technique,combined with multiple linear regression analysis,was then applied for source apportionment.Steel production appeared to be the main source for Ni,Co,Cd,Zn and Mn,agronomic input for Cu,and coal consumption for Pb and Cr.Geostatistical interpolation using ordinary kriging was finally used to map the spatial distributions of the contributions of pollution sources and further confirm the result interpretations.The introduced method appears to be an effective tool in soil pollution source apportionment and identification,and might provide valuable reference information for pollution control and environmental management.
The statistical process control methods - SPC
Directory of Open Access Journals (Sweden)
Floreková Ľubica
1998-03-01
Full Text Available Methods of statistical evaluation of quality SPC (item 20 of the documentation system of quality control of ISO norm, series 900 of various processes, products and services belong amongst basic qualitative methods that enable us to analyse and compare data pertaining to various quantitative parameters. Also they enable, based on the latter, to propose suitable interventions with the aim of improving these processes, products and services. Theoretical basis and applicatibily of the principles of the: - diagnostics of a cause and effects, - Paret analysis and Lorentz curve, - number distribution and frequency curves of random variable distribution, - Shewhart regulation charts, are presented in the contribution.
Wallace, Jack; Champagne, Pascale; Hall, Geof
2016-06-01
The wastewater stabilization ponds (WSPs) at a wastewater treatment facility in eastern Ontario, Canada, have experienced excessive algae growth and high pH levels in the summer months. A full range of parameters were sampled from the system and the chemical dynamics in the three WSPs were assessed through multivariate statistical analysis. The study presents a novel approach for exploratory analysis of a comprehensive water chemistry dataset, incorporating principal components analysis (PCA) and principal components (PC) and partial least squares (PLS) regressions. The analyses showed strong correlations between chl-a and sunlight, temperature, organic matter, and nutrients, and weak and negative correlations between chl-a and pH and chl-a and DO. PCA reduced the data from 19 to 8 variables, with a good fit to the original data matrix (similarity measure of 0.73). Multivariate regressions to model system pH in terms of these key parameters were performed on the reduced variable set and the PCs generated, for which strong fits (R(2) > 0.79 with all data) were observed. The methodologies presented in this study are applicable to a wide range of natural and engineered systems where a large number of water chemistry parameters are monitored resulting in the generation of large data sets. Copyright © 2016 Elsevier Ltd. All rights reserved.
Nosrati, Kazem
2013-04-01
Soil degradation associated with soil erosion and land use is a critical problem in Iran and there is little or insufficient scientific information in assessing soil quality indicator. In this study, factor analysis (FA) and discriminant analysis (DA) were used to identify the most sensitive indicators of soil quality for evaluating land use and soil erosion within the Hiv catchment in Iran and subsequently compare soil quality assessment using expert opinion based on soil surface factors (SSF) form of Bureau of Land Management (BLM) method. Therefore, 19 soil physical, chemical, and biochemical properties were measured from 56 different sampling sites covering three land use/soil erosion categories (rangeland/surface erosion, orchard/surface erosion, and rangeland/stream bank erosion). FA identified four factors that explained for 82 % of the variation in soil properties. Three factors showed significant differences among the three land use/soil erosion categories. The results indicated that based upon backward-mode DA, dehydrogenase, silt, and manganese allowed more than 80 % of the samples to be correctly assigned to their land use and erosional status. Canonical scores of discriminant functions were significantly correlated to the six soil surface indices derived of BLM method. Stepwise linear regression revealed that soil surface indices: soil movement, surface litter, pedestalling, and sum of SSF were also positively related to the dehydrogenase and silt. This suggests that dehydrogenase and silt are most sensitive to land use and soil erosion.
1981-09-01
Statistics, Carnegie-Mellon University. **At present with the Air Force Institute of Technology, Wright-Patterson Air Force Base, Ohio. #.,p- edfor puL lic...recurrence relation 3J B. - Z kak8" k ’ o = 1 (2.7) J -O k =l k . We use the following notations as defined in James (1964). The complex multivariate...gamma functions F’ (a)p and r P(a, K ) are given by (a) p(p-l)/2 P (a) = Tp ) r(a-i+l). (2.8) p i=l pa <) p(p-l)/2 Pr (a, K ) = n(a-i+l+ki) (2.9) i=l where
Mallamace, Domenico; Corsaro, Carmelo; Salvo, Andrea; Cicero, Nicola; Macaluso, Andrea; Giangrosso, Giuseppe; Ferrantelli, Vincenzo; Dugo, Giacomo
2014-05-01
We have studied by means of High Resolution Magic Angle Spinning Nuclear Magnetic Resonance the metabolic profile of the famous Sicilian cherry tomato of Pachino. Thanks to its organoleptic and healthy properties, this particular foodstuff was the first tomato accredited by the European PGI (Protected Geographical Indication) certification of quality. Due to the relatively high price of the final product commercial frauds originated in the Italian and international markets. Hence, there is a growing interest to develop analytical techniques able to predict the origin of a tomato sample, indicating whether or not it originates from the area of Pachino, Sicily (Italy). In this paper we have determined the molar concentration of the metabolites constituent the PGI cherry tomato of Pachino. Furthermore, by means of a multivariate statistical analysis we have identified which metabolites are relevant for sample differentiation.
Silveira, Landulfo; Borges, Rita de Cássia Fernandes; Navarro, Ricardo Scarparo; Giana, Hector Enrique; Zângaro, Renato Amaro; Pacheco, Marcos Tadeu Tavares; Fernandes, Adriana Barrinha
2017-05-01
Raman spectroscopy has been employed in the quantitative analysis of biochemical components in human serum. This study aimed to develop a spectral model to estimate the concentration of glucose and lipid fractions in human serum, thus evaluating the feasibility of Raman spectroscopy technique for diagnostic purposes. A total of 44 samples of blood serum were collected from volunteers submitted to routine blood biochemical assay analysis. The biochemical concentrations of glucose, triglycerides, cholesterol, and high-density and low-density lipoproteins (HDL and LDL) were obtained by colorimetric method. Serum samples (200 μL) were submitted to Raman spectroscopy (830 nm, 250 mW, 50-s accumulation). The spectra of sera present peaks related to the main constituents, particularly proteins and lipids. A quantitative model based on partial least squares (PLS) regression has been developed to estimate the concentration of these compounds, taking the biochemical concentrations assayed by the colorimetric method as sample's actual concentrations. The PLS model based on leave-one-out cross-validation approach estimated the concentration of triglycerides and cholesterol with r = 0.98 and 0.96, and root mean square error of 35.4 and 15.9 mg/dL, respectively. For the other biochemicals, the r was ranging from 0.75 to 0.86. These results evidenced the possibility of performing biochemical assay in blood serum samples by Raman spectroscopy and PLS regression and may be employed as a means of diagnosis in routine clinical analysis.
Directory of Open Access Journals (Sweden)
Z. Pasandidehfard
2014-03-01
Full Text Available Nonpoint source (NPS pollution is a major surface water contaminant commonly caused by agricultural runoff. The purpose of this study was to assess seasonal variation in water quality parameters in Gorganrood watershed (Golestan Province, Iran. It also tried to clarify the effects of agricultural practices and NPS pollution on them. Water quality parameters including potassium, sodium, pH, water flow rate, total dissolved solids (TDS, electrical conductivity (EC, hardness, sulfate, bicarbonate, chlorine, magnesium, and calcium ions during 1966-2010 were evaluated using multivariate statistical techniques. Multivariate analysis of variance (MANOVA was implemented to determine the significance of differences between mean seasonal values. Discriminant analysis (DA was also carried out to identify correlations between seasons and the water quality parameters. Parameters of water quality index were measured through principal component analysis (PCA and factor analysis (FA. Based on the results of statistical tests, climate (freezing, weathering and rainfall and human activities such as agriculture had crucial effects on water quality. The most important parameters in differentiation between seasons in descending order were potassium, pH, carbonic acid, calcium, and magnesium. According to load factor analysis, chlorine, calcium, and potassium were the most important parameters in spring and summer, indicating the application of fertilizers (especially potassium chloride fertilizer and existence of NPS pollution during these seasons. In the next stage, the months during which crops had excessive water requirements were detected using CROPWAT software. Almost all water requirements of the area’s major crops, i.e. cotton, rice, soya, wheat, and oat, happen in the late spring until mid/late summer. According to our findings, agricultural practices had a great impact on water pollution. Results of analysis with CROPWAT software also confirmed this
Institute of Scientific and Technical Information of China (English)
郑力燕; 于宏兵; 王启山
2016-01-01
Multivariate statistical techniques, such as cluster analysis (CA), discriminant analysis (DA), principal component analysis (PCA) and factor analysis (FA), were applied to evaluate and interpret the surface water quality data sets of the Second Songhua River (SSHR) basin in China, obtained during two years (2012−2013) of monitoring of 10 physicochemical parameters at 15 different sites. The results showed that most of physicochemical parameters varied significantly among the sampling sites. Three significant groups, highly polluted (HP), moderately polluted (MP) and less polluted (LP), of sampling sites were obtained through Hierarchical agglomerative CA on the basis of similarity of water quality characteristics. DA identified pH, F, DO, NH3-N, COD and VPhs were the most important parameters contributing to spatial variations of surface water quality. However, DA did not give a considerable data reduction (40%reduction). PCA/FA resulted in three, three and four latent factors explaining 70%, 62%and 71%of the total variance in water quality data sets of HP, MP and LP regions, respectively. FA revealed that the SSHR water chemistry was strongly affected by anthropogenic activities (point sources: industrial effluents and wastewater treatment plants; non-point sources:domestic sewage, livestock operations and agricultural activities) and natural processes (seasonal effect, and natural inputs). PCA/FA in the whole basin showed the best results for data reduction because it used only two parameters (about 80%reduction) as the most important parameters to explain 72%of the data variation. Thus, this work illustrated the utility of multivariate statistical techniques for analysis and interpretation of datasets and, in water quality assessment, identification of pollution sources/factors and understanding spatial variations in water quality for effective stream water quality management.
Zanon, Cristina; Stocchero, Matteo; Albiero, Elena; Castegnaro, Silvia; Chieregato, Katia; Madeo, Domenico; Rodeghiero, Francesco; Astori, Giuseppe
2014-07-01
Cytokine-induced killer (CIK) cells, obtained after mononucleated cell stimulation with interferon-γ, interleukin-2, and anti-CD3 antibody, are constituted by CD3(+) CD56(+) (CIK) cells and a minority of natural killer (NK; CD3(-) CD56(+) ) cells and T-lymphocytes (CD3(+) CD56(-) ) with antitumor effect against hematological malignancies, thus representing a promising immunotherapy strategy. To ensure in vivo antitumor activity it is mandatory to maximize the percentage of CD3(+) 56(+) effector cells, which is highly variable depending on the starting sample and the harvesting day. Based on cytofluorimetric data, we have retrospectively applied multivariate statistical data analysis (MVDA) to 30 expansions building mathematical models able to predict the expansion fate and the optimal CIK harvesting day. Cell phenotype was monitored during culture; multivariate batch statistical process control was applied to monitor cell expansion and orthogonal projections to latent structures to predict CIK percentage. Ten expansions had CD3(+) CD56(+) cells ≥ 40% (good batches) and 20 had CD3(+) CD56(+) cells ≤ 40%. In 36.7%, CD3(+) CD56(+) cells reached the highest concentration at day 17 and the others at day 21. We built a highly predictive regression model for estimating CD3(+) CD56(+) cells during culture. Three variables resulted highly informative: NK % at day 0, cytotoxic T-lymphocytes % (CTLs, CD3(+) CD8(+) ) at day 4, and CIK % at day 7. "Good batches" are characterized by a high percentage of CTLs and CD3(+) CD56(+) cells at day 4 and day 7, respectively. By applying MVDA it is possible to optimize CIK expansion, deciding the optimal cell harvesting day. A predictive role for CTL and CIK was evidenced. © 2013 Clinical Cytometry Society.
Damage assessment for wind turbine blades based on a multivariate statistical approach
García, David; Tcherniak, Dmitri; Trendafilova, Irina
2015-07-01
This paper presents a vibration based structural health monitoring methodology for damage assessment on wind turbine blades made of composite laminates. Normally, wind turbine blades are manufactured by two half shells made by composite laminates which are glued together. This connection must be carefully controlled due to its high probability to disbond which might result in collapse of the whole structure. The delamination between both parts must be monitored not only for detection but also for localisation and severity determination. This investigation consists in a real time monitoring methodology which is based on singular spectrum analysis (SSA) for damage and delamination detection. SSA is able to decompose the vibratory response in a certain number of components based on their covariance distribution. These components, known as Principal Components (PCs), contain information about of the oscillatory patterns of the vibratory response. The PCs are used to create a new space where the data can be projected for better visualization and interpretation. The method suggested is applied herein for a wind turbine blade where the free-vibration responses were recorded and processed by the methodology. Damage for different scenarios viz different sizes and locations was introduced on the blade. The results demonstrate a clear damage detection and localization for all damage scenarios and for the different sizes.
Deng, Linhua
2015-07-01
Three nonlinear analysis techniques, including cross-recurrence plot, line of synchronization, and cross-wavelet transform, are proposed to estimate the coherent phase vibrations of nonlinear and non-stationary time series. The case study utilizes the monthly averages of sunspot areas during the time interval from May 1874 to August 2014. The following prominent results are found: (1) the phase-leading hemisphere of long-term sunspot areas has changed twice in the past 140 years, indicating that the hemispheric imbalances and apparent phase differences on both hemispheres are a prevalent behavior and are not anomalous; (2) the alternating regularity of hemispheric asynchronism exhibits a cyclical pattern of 4.5+3.5 cycles, and the magnetic flux excess in a certain hemisphere during the ascending branch of a cycle can be taken as an indication of the phase-leading hemisphere in this cycle. We firmly believe that powerful nonlinear approaches are more advanced than classical linear methods when they are combined to determine the dynamic complexity of nonlinear physical systems.
Noshadi, Masoud; Ghafourian, Amir
2016-07-01
This research investigated the quality of groundwater of 298 wells during 10 years, in Fars province, southern Iran, to survey spatial variation of groundwater quality and also major sources of hydro-chemical components for drinking and agricultural uses. To classify the sampling stations in each year, hierarchical cluster analysis, using the Euclidean distances and "Ward" method, was used. According to the results of cluster analysis, there were three quality groups in groundwater of the research area: first group of 170 wells with type of Ca-HCO3, second group of 98 wells with type of Ca-HCO3, and third group of 30 wells with type of Na-Cl. Hydro-chemical parameters were increased from the first to the third group, and on the basis of Schoeller and USSL diagrams, the water of wells of the third group was considered unsuitable for irrigation and drinking. Principal component (PC) analysis and factor analysis reduced the complex and voluminous data matrix into three main components, accounting for more than 80 % of the total variance. The first PC contained TDS, EC, TH, Na(+), Cl(-), Mg(2+), SO4 (2-), Ca(2+), and SAR parameters. Therefore, the first dominant factor was salinity. In PC2, HCO3 and pH were the dominant parameters, which may indicate weathering of silicate minerals. The PC3 contained high loadings for NO2 (2-) and NO3 (-). This factor indicates anthropogenic contaminants that may be caused by improper disposal of domestic wastes or the use of chemical fertilizers in agriculture and leaching of them.
Multivariate Statistical Models for Predicting Sediment Yields from Southern California Watersheds
Gartner, Joseph E.; Cannon, Susan H.; Helsel, Dennis R.; Bandurraga, Mark
2009-01-01
Debris-retention basins in Southern California are frequently used to protect communities and infrastructure from the hazards of flooding and debris flow. Empirical models that predict sediment yields are used to determine the size of the basins. Such models have been developed using analyses of records of the amount of material removed from debris retention basins, associated rainfall amounts, measures of watershed characteristics, and wildfire extent and history. In this study we used multiple linear regression methods to develop two updated empirical models to predict sediment yields for watersheds located in Southern California. The models are based on both new and existing measures of volume of sediment removed from debris retention basins, measures of watershed morphology, and characterization of burn severity distributions for watersheds located in Ventura, Los Angeles, and San Bernardino Counties. The first model presented reflects conditions in watersheds located throughout the Transverse Ranges of Southern California and is based on volumes of sediment measured following single storm events with known rainfall conditions. The second model presented is specific to conditions in Ventura County watersheds and was developed using volumes of sediment measured following multiple storm events. To relate sediment volumes to triggering storm rainfall, a rainfall threshold was developed to identify storms likely to have caused sediment deposition. A measured volume of sediment deposited by numerous storms was parsed among the threshold-exceeding storms based on relative storm rainfall totals. The predictive strength of the two models developed here, and of previously-published models, was evaluated using a test dataset consisting of 65 volumes of sediment yields measured in Southern California. The evaluation indicated that the model developed using information from single storm events in the Transverse Ranges best predicted sediment yields for watersheds in San
Statistical methods of discrimination and classification advances in theory and applications
Choi, Sung C
1986-01-01
Statistical Methods of Discrimination and Classification: Advances in Theory and Applications is a collection of papers that tackles the multivariate problems of discriminating and classifying subjects into exclusive population. The book presents 13 papers that cover that advancement in the statistical procedure of discriminating and classifying. The studies in the text primarily focus on various methods of discriminating and classifying variables, such as multiple discriminant analysis in the presence of mixed continuous and categorical data; choice of the smoothing parameter and efficiency o
Statistical methods for assessment of blend homogeneity
DEFF Research Database (Denmark)
Madsen, Camilla
2002-01-01
In this thesis the use of various statistical methods to address some of the problems related to assessment of the homogeneity of powder blends in tablet production is discussed. It is not straight forward to assess the homogeneity of a powder blend. The reason is partly that in bulk materials......, it is shown how to set up parametric acceptance criteria for the batch that gives a high confidence that future samples with a probability larger than a specified value will pass the USP threeclass criteria. Properties and robustness of proposed changes to the USP test for content uniformity are investigated...
Muhammad, Said; Tahir Shah, M; Khan, Sardar
2010-10-01
The present study was conducted in Kohistan region, where mafic and ultramafic rocks (Kohistan island arc and Indus suture zone) and metasedimentary rocks (Indian plate) are exposed. Water samples were collected from the springs, streams and Indus river and analyzed for physical parameters, anions, cations and arsenic (As(3+), As(5+) and arsenic total). The water quality in Kohistan region was evaluated by comparing the physio-chemical parameters with permissible limits set by Pakistan environmental protection agency and world health organization. Most of the studied parameters were found within their respective permissible limits. However in some samples, the iron and arsenic concentrations exceeded their permissible limits. For health risk assessment of arsenic, the average daily dose, hazards quotient (HQ) and cancer risk were calculated by using statistical formulas. The values of HQ were found >1 in the samples collected from Jabba, Dubair, while HQ values were samples. This level of contamination should have low chronic risk and medium cancer risk when compared with US EPA guidelines. Furthermore, the inter-dependence of physio-chemical parameters and pollution load was also calculated by using multivariate statistical techniques like one-way ANOVA, correlation analysis, regression analysis, cluster analysis and principle component analysis.
Energy Technology Data Exchange (ETDEWEB)
Molinaroli, E.; Pistolato, M.; Rampazzo, G. [Dipartimento di Scienze Ambientali, Universita di Venezia, Dorsoduro 2137, 30123 Venezia (Italy); Guerzoni, S. [CNR, Istituto di Geologia Marina, via Gobetti 101, 40129 Bologna (Italy)
1999-06-01
The chemical characteristics of the mineral fractions of aerosol and precipitation collected in Sardinia (NW Mediterranean) are highlighted by means of two multivariate statistical approaches. Two different combinations of classification and statistical methods for geochemical data are presented. It is shown that the application of cluster analysis subsequent to Q-Factor analysis better distinguishes among Saharan dust, background pollution (Europe-Mediterranean) and local aerosol from various source regions (Sardinia). Conversely, the application of simple cluster analysis was able to distinguish only between aerosols and precipitation particles, without assigning the sources (local or distant) to the aerosol. This method also highlighted the fact that crust-enriched precipitation is similar to desert-derived aerosol. Major elements (Al, Na) and trace metal (Pb) turn out to be the most discriminating elements of the analysed data set. Independent use of mineralogical, granulometric and meteorological data confirmed the results derived from the statistical methods employed. (Copyright (c) 1999 Elsevier Science B.V., Amsterdam. All rights reserved.)
Statistical methods in credit risk management
Directory of Open Access Journals (Sweden)
Ljiljanka Kvesić
2012-12-01
Full Text Available Successful banks base their operations on the principles of liquidity, profitability and safety. Therefore, the correct assessment of the ability of a loan applicant to carry out certain obligations is of crucial importance for the functioning of a bank. In the past few decades several credit scoring models have been developed to provide support to credit analysts in the assessment of a loan applicant. This paper presents three statistical methods that are used for this purpose in the area of credit risk management: logistical regression, discriminatory analysis and survival analysis. Their implementation in the banking sector was motivated to a great extent by the development and application of information and communication technologies. This paper aims to point out the most important theoretical aspects of these methods, but also to actualise the need for the development and application of the credit scoring model in Croatian banking practice.
Statistical Methods in Phylogenetic and Evolutionary Inferences
Directory of Open Access Journals (Sweden)
Luigi Bertolotti
2013-05-01
Full Text Available Molecular instruments are the most accurate methods in organisms’identification and characterization. Biologists are often involved in studies where the main goal is to identify relationships among individuals. In this framework, it is very important to know and apply the most robust approaches to infer correctly these relationships, allowing the right conclusions about phylogeny. In this review, we will introduce the reader to the most used statistical methods in phylogenetic analyses, the Maximum Likelihood and the Bayesian approaches, considering for simplicity only analyses regardingDNA sequences. Several studieswill be showed as examples in order to demonstrate how the correct phylogenetic inference can lead the scientists to highlight very peculiar features in pathogens biology and evolution.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Martens, Edwin P|info:eu-repo/dai/nl/088859010; de Boer, Anthonius|info:eu-repo/dai/nl/075097346; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H|info:eu-repo/dai/nl/181447649
PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort
Singh, Veena D.; Daharwal, Sanjay J.
2017-01-01
Three multivariate calibration spectrophotometric methods were developed for simultaneous estimation of Paracetamol (PARA), Enalapril maleate (ENM) and Hydrochlorothiazide (HCTZ) in tablet dosage form; namely multi-linear regression calibration (MLRC), trilinear regression calibration method (TLRC) and classical least square (CLS) method. The selectivity of the proposed methods were studied by analyzing the laboratory prepared ternary mixture and successfully applied in their combined dosage form. The proposed methods were validated as per ICH guidelines and good accuracy; precision and specificity were confirmed within the concentration range of 5-35 μg mL- 1, 5-40 μg mL- 1 and 5-40 μg mL- 1of PARA, HCTZ and ENM, respectively. The results were statistically compared with reported HPLC method. Thus, the proposed methods can be effectively useful for the routine quality control analysis of these drugs in commercial tablet dosage form.
Energy Technology Data Exchange (ETDEWEB)
Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo [Kyung Hee University, Yongin (Korea, Republic of)
2015-08-15
Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.
Application of pedagogy reflective in statistical methods course and practicum statistical methods
Julie, Hongki
2017-08-01
Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.
Joshi, Suresh M.
1987-01-01
The problem of designing fine-pointing controllers is considered for large, flexible space structures using modern multivariable synthesis methods. The first method is an iterative procedure which utilizes frequency-domain singular-value techniques, and is found to yield satisfactory performance and robustness. For the second method, which is based on coprime factorizations, a particular bicoprime is obtained, and the steps in the design process are described. This method is still under development.
Guimarães Nobre, Gabriela; Arnbjerg-Nielsen, Karsten; Rosbjerg, Dan; Madsen, Henrik
2016-04-01
Traditionally, flood risk assessment studies have been carried out from a univariate frequency analysis perspective. However, statistical dependence between hydrological variables, such as extreme rainfall and extreme sea surge, is plausible to exist, since both variables to some extent are driven by common meteorological conditions. Aiming to overcome this limitation, multivariate statistical techniques has the potential to combine different sources of flooding in the investigation. The aim of this study was to apply a range of statistical methodologies for analyzing combined extreme hydrological variables that can lead to coastal and urban flooding. The study area is the Elwood Catchment, which is a highly urbanized catchment located in the city of Port Phillip, Melbourne, Australia. The first part of the investigation dealt with the marginal extreme value distributions. Two approaches to extract extreme value series were applied (Annual Maximum and Partial Duration Series), and different probability distribution functions were fit to the observed sample. Results obtained by using the Generalized Pareto distribution demonstrate the ability of the Pareto family to model the extreme events. Advancing into multivariate extreme value analysis, first an investigation regarding the asymptotic properties of extremal dependence was carried out. As a weak positive asymptotic dependence between the bivariate extreme pairs was found, the Conditional method proposed by Heffernan and Tawn (2004) was chosen. This approach is suitable to model bivariate extreme values, which are relatively unlikely to occur together. The results show that the probability of an extreme sea surge occurring during a one-hour intensity extreme precipitation event (or vice versa) can be twice as great as what would occur when assuming independent events. Therefore, presuming independence between these two variables would result in severe underestimation of the flooding risk in the study area.
On two methods of statistical image analysis
Missimer, J; Knorr, U; Maguire, RP; Herzog, H; Seitz, RJ; Tellman, L; Leenders, KL
1999-01-01
The computerized brain atlas (CBA) and statistical parametric mapping (SPM) are two procedures for voxel-based statistical evaluation of PET activation studies. Each includes spatial standardization of image volumes, computation of a statistic, and evaluation of its significance. In addition, smooth
Cardot, J-M; Roudier, B; Schütz, H
2017-07-01
The f 2 test is generally used for comparing dissolution profiles. In cases of high variability, the f 2 test is not applicable, and the Multivariate Statistical Distance (MSD) test is frequently proposed as an alternative by the FDA and EMA. The guidelines provide only general recommendations. MSD tests can be performed either on raw data with or without time as a variable or on parameters of models. In addition, data can be limited-as in the case of the f 2 test-to dissolutions of up to 85% or to all available data. In the context of the present paper, the recommended calculation included all raw dissolution data up to the first point greater than 85% as a variable-without the various times as parameters. The proposed MSD overcomes several drawbacks found in other methods.
The Monte Carlo method the method of statistical trials
Shreider, YuA
1966-01-01
The Monte Carlo Method: The Method of Statistical Trials is a systematic account of the fundamental concepts and techniques of the Monte Carlo method, together with its range of applications. Some of these applications include the computation of definite integrals, neutron physics, and in the investigation of servicing processes. This volume is comprised of seven chapters and begins with an overview of the basic features of the Monte Carlo method and typical examples of its application to simple problems in computational mathematics. The next chapter examines the computation of multi-dimensio
Statistical methods for astronomical data analysis
Chattopadhyay, Asis Kumar
2014-01-01
This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...
Vergeynst, Leendert; Van Langenhove, Herman; Demeestere, Kristof
2015-02-17
Modern high-resolution mass spectrometry (HRMS) enables full-spectrum trace level analysis of emerging environmental organic contaminants. This raises the opportunity for post-acquisition suspect screening when no reference standards are a priori available. When setting up a conventional screening identification train based on successively different identification criteria including mass error and isotope fit, the false negative rate typically accumulates upon advancing through the decision tree. The challenge is thus to elaborate a well-balanced screening, in which the different criteria are equally stringent, leading to a controllable number of false negatives. Presented is a novel suspect screening approach using liquid-chromatography Orbitrap HRMS. Based on a multivariate statistical model, the screening takes into account the accurate mass error of the mono isotopic ion and up to three isotopes, isotope ratios, and a peak/noise filter. As such, for the first time, controlling the overall false negative rate of the screening algorithm to a desired level (5% in this study) is achieved. Simultaneously, a well-balanced identification decision is guaranteed taking the different identification criteria as a whole in a holistic statistical approach. Taking into account 1, 2, and 3 isotopes decreases the false positive rate from 22, 2.8 to <0.3%, but the cost of increasing the median limits of identification from 200, 2000 to 2062 ng L(-1), respectively, should also be considered. As proof of concept, 7 biologically treated wastewaters were screened toward 77 suspect pharmaceuticals resulting in the indicative identification of 25 suspects. Subsequently obtained reference standards allowed confirmation for 19 out of these 25 pharmaceutical contaminants.
Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti
2016-07-01
A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
A NEW METHOD FOR THE CONSTRUCTION OF MULTIVARIATE MINIMAL INTERPOLATION POLYNOMIAL
Institute of Scientific and Technical Information of China (English)
Zhang Chuanlin
2001-01-01
The extended Hermite interpolation problem on segment points set over n-dimensional Euclidean space is cansidered. Based on the algorithm to com pute the Grobner basis of Ideal given by dual basis a new method to construct minimal multivariate polynomial which satis fies the interpolation conditions is given.
Meta-Analytic Structural Equation Modeling (MASEM): Comparison of the Multivariate Methods
Zhang, Ying
2011-01-01
Meta-analytic Structural Equation Modeling (MASEM) has drawn interest from many researchers recently. In doing MASEM, researchers usually first synthesize correlation matrices across studies using meta-analysis techniques and then analyze the pooled correlation matrix using structural equation modeling techniques. Several multivariate methods of…
Institute of Scientific and Technical Information of China (English)
LI Lian-fang; LI Guo-xue; LIAO Xiao-yong
2004-01-01
This paper presented the characteristics of nitrogen and phosphorus pollution in Beijing surface water during the survey. A significant difference was found out in concentration distribution of various parameters of nitrogen and phosphorus. Most water bodies in five water systems were polluted by total nitrogen with the content even up to 120 mg/L which was higher than exceeded the fifth class standard of national surface water quality standard GB3838-2002 except for several segments of Chaobaihe and Yongdinghe. Ammonia and phosphorus showed a similar tendency of distribution with higher content in Daqinghe, Beiyunhe and Jiyunhe water systems, but with relatively low concentrations in Chaobaihe and Yongdinghe water systems. Meanwhile, nitrate was found at comparatively low content(mostly less than 10 mg/L) and could fit for corresponding water quality requirements. Totally, the water quality of Daqinghe, Jiyunhe and Beiyunhe river systems as well as the lower reaches of Yongdinghe and Chaobaihe was contaminated seriously with high content of total nitrogen and phosphorus. Through multivariate statistical approaches, it can be concluded that total nitrogen, ammonia and total phosphorus was highly correlated to chemical oxygen demand, biochemical oxygen demand, dissolved oxygen and electrical conductivity, which explained the same pollution source from anthropogenic activities.
Niaki, Seyed Taghi Akhavan; Javad Ershadi, Mohammad
2012-12-01
In this research, the main parameters of the multivariate cumulative sum (CUSUM) control chart (the reference value k, the control limit H, the sample size n and the sampling interval h) are determined by minimising the Lorenzen-Vance cost function [Lorenzen, T.J., and Vance, L.C. (1986), 'The Economic Design of Control Charts: A Unified Approach', Technometrics, 28, 3-10], in which the external costs of employing the chart are added. In addition, the model is statistically constrained to achieve desired in-control and out-of-control average run lengths. The Taguchi loss approach is used to model the problem and a genetic algorithm, for which its main parameters are tuned using the response surface methodology (RSM), is proposed to solve it. At the end, sensitivity analyses on the main parameters of the cost function are presented and their practical conclusions are drawn. The results show that RSM significantly improves the performance of the proposed algorithm and the external costs of applying the chart, which are due to real-world constraints, do not increase the average total loss very much.
Energy Technology Data Exchange (ETDEWEB)
Freitas, Renato [Instituto Federal de Educacao, Ciencia e Tecnologia do Rio de Janeiro (CPAR/IFRJ), RJ (Brazil). Curso de Licenciatura em Matematica; Calza, Cristiane Ferreira; Lopes, Ricardo Tadeu [Coordenacao dos Programas de Pos-Graduacao de Engenharia (COPPE/UFRJ), RJ (Brazil); Rabello, Angela; Lima, Tania [Museu Nacional (MN/UFRJ), Rio de Janeiro, RJ (Brazil)
2011-07-01
Full text: In this work it was characterized the elemental composition of 102 fragments of Marajoara pubic covers, belonging to the National Museum collection, using EDXRF and multivariate statistics analysis. The objective was to identify possible groups of samples that presented similar characteristics. This information will be useful in the development of a systematic classification of these artifacts. Provenance studies of ancient ceramics are based on the assumption that pottery produced from a specific clay will present a similar chemical composition, which will distinguish them from pottery produced from a different clay. In this way, the pottery is assigned to particular production groups, which are then correlated with their respective origins. EDXRF measurements were carried out with a portable system, developed in the Nuclear Instrumentation Laboratory, consisting of an X-ray tube Oxford TF3005 with tungsten (W) anode, operating at 25 kV and 100 {mu}A, and a Si-PIN XR-100CR detector from Amptek. In each one of the 102 fragments, six points were analyzed (three in the front part and three in the reverse) with an acquisition time of 600 s and a beam collimation of 2 mm. The spectra were processed and analyzed using the software QXAS-AXIL from IAEA. PCA was applied to the XRF results revealing a clear cluster separation to the samples. (author)
Mujica Ascencio, Saul; Choe, ChunSik; Meinke, Martina C; Müller, Rainer H; Maksimov, George V; Wigger-Alberti, Walter; Lademann, Juergen; Darvin, Maxim E
2016-07-01
Propylene glycol is one of the known substances added in cosmetic formulations as a penetration enhancer. Recently, nanocrystals have been employed also to increase the skin penetration of active components. Caffeine is a component with many applications and its penetration into the epidermis is controversially discussed in the literature. In the present study, the penetration ability of two components - caffeine nanocrystals and propylene glycol, applied topically on porcine ear skin in the form of a gel, was investigated ex vivo using two confocal Raman microscopes operated at different excitation wavelengths (785nm and 633nm). Several depth profiles were acquired in the fingerprint region and different spectral ranges, i.e., 526-600cm(-1) and 810-880cm(-1) were chosen for independent analysis of caffeine and propylene glycol penetration into the skin, respectively. Multivariate statistical methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) combined with Student's t-test were employed to calculate the maximum penetration depths of each substance (caffeine and propylene glycol). The results show that propylene glycol penetrates significantly deeper than caffeine (20.7-22.0μm versus 12.3-13.0μm) without any penetration enhancement effect on caffeine. The results confirm that different substances, even if applied onto the skin as a mixture, can penetrate differently. The penetration depths of caffeine and propylene glycol obtained using two different confocal Raman microscopes are comparable showing that both types of microscopes are well suited for such investigations and that multivariate statistical PCA-LDA methods combined with Student's t-test are very useful for analyzing the penetration of different substances into the skin.
Biondo, M.; Bartholomä, A.
2014-12-01
High resolution hydro acoustic methods have been successfully employed for the detailed classification of sedimentary habitats. The fine-scale mapping of very heterogeneous, patchy sedimentary facies, and the compound effect of multiple non-linear physical processes on the acoustic signal, cause the classification of backscatter images to be subject to a great level of uncertainty. Standard procedures for assessing the accuracy of acoustic classification maps are not yet established. This study applies different statistical techniques to automated classified acoustic images with the aim of i) quantifying the ability of backscatter to resolve grain size distributions ii) understanding complex patterns influenced by factors other than grain size variations iii) designing innovative repeatable statistical procedures to spatially assess classification uncertainties. A high-frequency (450 kHz) sidescan sonar survey, carried out in the year 2012 in the shallow upper-mesotidal inlet the Jade Bay (German North Sea), allowed to map 100 km2 of surficial sediment with a resolution and coverage never acquired before in the area. The backscatter mosaic was ground-truthed using a large dataset of sediment grab sample information (2009-2011). Multivariate procedures were employed for modelling the relationship between acoustic descriptors and granulometric variables in order to evaluate the correctness of acoustic classes allocation and sediment group separation. Complex patterns in the acoustic signal appeared to be controlled by the combined effect of surface roughness, sorting and mean grain size variations. The area is dominated by silt and fine sand in very mixed compositions; in this fine grained matrix, percentages of gravel resulted to be the prevailing factor affecting backscatter variability. In the absence of coarse material, sorting mostly affected the ability to detect gradual but significant changes in seabed types. Misclassification due to temporal discrepancies
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Adams, Dean C
2014-09-01
Studies of evolutionary correlations commonly use phylogenetic regression (i.e., independent contrasts and phylogenetic generalized least squares) to assess trait covariation in a phylogenetic context. However, while this approach is appropriate for evaluating trends in one or a few traits, it is incapable of assessing patterns in highly multivariate data, as the large number of variables relative to sample size prohibits parametric test statistics from being computed. This poses serious limitations for comparative biologists, who must either simplify how they quantify phenotypic traits, or alter the biological hypotheses they wish to examine. In this article, I propose a new statistical procedure for performing ANOVA and regression models in a phylogenetic context that can accommodate high-dimensional datasets. The approach is derived from the statistical equivalency between parametric methods using covariance matrices and methods based on distance matrices. Using simulations under Brownian motion, I show that the method displays appropriate Type I error rates and statistical power, whereas standard parametric procedures have decreasing power as data dimensionality increases. As such, the new procedure provides a useful means of assessing trait covariation across a set of taxa related by a phylogeny, enabling macroevolutionary biologists to test hypotheses of adaptation, and phenotypic change in high-dimensional datasets.
Silva Spinacé, M A; Lucato, M U; Ferrão, M F; Davanzo, C U; De Paoli, M-A
2006-05-15
A methodology was developed to determine the intrinsic viscosity of poly(ethylene terephthalate) (PET) using diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) and multivariate calibration (MVC) methods. Multivariate partial least squares calibration was applied to the spectra using mean centering and cross validation. The results were correlated to the intrinsic viscosities determined by the standard chemical method (ASTM D 4603-01) and a very good correlation for values in the range from 0.346 to 0.780dLg(-1) (relative viscosity values ca. 1.185-1.449) was observed. The spectrophotometer detector sensitivity and the humidity of the samples did not influence the results. The methodology developed is interesting because it does not produce hazardous wastes, avoids the use of time-consuming chemical methods and can rapidly predict the intrinsic viscosity of PET samples over a large range of values, which includes those of recycled materials.
Riley, Richard D; Elia, Eleni G; Malin, Gemma; Hemming, Karla; Price, Malcolm P
2015-07-30
A prognostic factor is any measure that is associated with the risk of future health outcomes in those with existing disease. Often, the prognostic ability of a factor is evaluated in multiple studies. However, meta-analysis is difficult because primary studies often use different methods of measurement and/or different cut-points to dichotomise continuous factors into 'high' and 'low' groups; selective reporting is also common. We illustrate how multivariate random effects meta-analysis models can accommodate multiple prognostic effect estimates from the same study, relating to multiple cut-points and/or methods of measurement. The models account for within-study and between-study correlations, which utilises more information and reduces the impact of unreported cut-points and/or measurement methods in some studies. The applicability of the approach is improved with individual participant data and by assuming a functional relationship between prognostic effect and cut-point to reduce the number of unknown parameters. The models provide important inferential results for each cut-point and method of measurement, including the summary prognostic effect, the between-study variance and a 95% prediction interval for the prognostic effect in new populations. Two applications are presented. The first reveals that, in a multivariate meta-analysis using published results, the Apgar score is prognostic of neonatal mortality but effect sizes are smaller at most cut-points than previously thought. In the second, a multivariate meta-analysis of two methods of measurement provides weak evidence that microvessel density is prognostic of mortality in lung cancer, even when individual participant data are available so that a continuous prognostic trend is examined (rather than cut-points). © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Heidema, A.G.; Thissen, U.; Boer, J.M.; Bouwman, F.G.; Feskens, E.J.M.; Mariman, E.C.
2009-01-01
In this study, we applied the multivariate statistical tool Partial Least Squares (PLS) to analyze the relative importance of 83 plasma proteins in relation to coronary heart disease (CHD) mortality and the intermediate end points body mass index, HDL-cholesterol and total cholesterol. From a Dutch
Huang, Jun; Goolcharran, Chimanlall; Ghosh, Krishnendu
2011-05-01
This paper presents the use of experimental design, optimization and multivariate techniques to investigate root-cause of tablet dissolution shift (slow-down) upon stability and develop control strategies for a drug product during formulation and process development. The effectiveness and usefulness of these methodologies were demonstrated through two application examples. In both applications, dissolution slow-down was observed during a 4-week accelerated stability test under 51°C/75%RH storage condition. In Application I, an experimental design was carried out to evaluate the interactions and effects of the design factors on critical quality attribute (CQA) of dissolution upon stability. The design space was studied by design of experiment (DOE) and multivariate analysis to ensure desired dissolution profile and minimal dissolution shift upon stability. Multivariate techniques, such as multi-way principal component analysis (MPCA) of the entire dissolution profiles upon stability, were performed to reveal batch relationships and to evaluate the impact of design factors on dissolution. In Application II, an experiment was conducted to study the impact of varying tablet breaking force on dissolution upon stability utilizing MPCA. It was demonstrated that the use of multivariate methods, defined as Quality by Design (QbD) principles and tools in ICH-Q8 guidance, provides an effective means to achieve a greater understanding of tablet dissolution upon stability.
Konukoglu, Ender; Coutu, Jean-Philippe; Salat, David H.; Fischl, Bruce
2016-01-01
Diffusion magnetic resonance imaging (dMRI) is a unique technology that allows the noninvasive quantification of microstructural tissue properties of the human brain in healthy subjects as well as the probing of disease-induced variations. Population studies of dMRI data have been essential in identifying pathological structural changes in various conditions, such as Alzheimer’s and Huntington’s diseases1,2. The most common form of dMRI involves fitting a tensor to the underlying imaging data (known as Diffusion Tensor Imaging, or DTI), then deriving parametric maps, each quantifying a different aspect of the underlying microstructure, e.g. fractional anisotropy and mean diffusivity. To date, the statistical methods utilized in most DTI population studies either analyzed only one such map or analyzed several of them, each in isolation. However, it is most likely that variations in the microstructure due to pathology or normal variability would affect several parameters simultaneously, with differing variations modulating the various parameters to differing degrees. Therefore, joint analysis of the available diffusion maps can be more powerful in characterizing histopathology and distinguishing between conditions than the widely used univariate analysis. In this article, we propose a multivariate approach for statistical analysis of diffusion parameters that uses partial least squares correlation (PLSC) analysis and permutation testing as building blocks in a voxel-wise fashion. Stemming from the common formulation, we present three different multivariate procedures for group analysis, regressing-out nuisance parameters and comparing effects of different conditions. We used the proposed procedures to study the effects of non-demented aging, Alzheimer’s disease and mild cognitive impairment on the white matter. Here, we present results demonstrating that the proposed PLSC-based approach can differentiate between effects of different conditions in the same
Bouderbala, Abdelkader; Remini, Boualem; Saaed Hamoudi, Abdelamir; Pulido-Bosch, Antonio
2016-06-01
The study focuses on the characterization of the groundwater salinity on the Nador coastal aquifer (Algeria). The groundwater quality has undergone serious deterioration due to overexploitation. Groundwater samplings were carried out in high and low waters in 2013, in order to study the evolution of groundwater hydrochemistry from the recharge to the coastal area. Different kinds of statistical analysis were made in order to identify the main hydrogeochemical processes occurring in the aquifer and to discriminate between different groups of groundwater. These statistical methods provide a better understanding of the aquifer hydrochem-istry, and put in evidence a hydrochemical classification of wells, showing that the area with higher salinity is located close to the coast, in the first two kilometers, where the salinity gradually increases as one approaches the seaside and suggests the groundwater salinization by sea-water intrusion.
Directory of Open Access Journals (Sweden)
Bouderbala Abdelkader
2016-06-01
Full Text Available The study focuses on the characterization of the groundwater salinity on the Nador coastal aquifer (Algeria. The groundwater quality has undergone serious deterioration due to overexploitation. Groundwater samplings were carried out in high and low waters in 2013, in order to study the evolution of groundwater hydrochemistry from the recharge to the coastal area. Different kinds of statistical analysis were made in order to identify the main hydrogeochemical processes occurring in the aquifer and to discriminate between different groups of groundwater. These statistical methods provide a better understanding of the aquifer hydrochemistry, and put in evidence a hydrochemical classification of wells, showing that the area with higher salinity is located close to the coast, in the first two kilometers, where the salinity gradually increases as one approaches the seaside and suggests the groundwater salinization by seawater intrusion.
Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi
2015-03-15
Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Meizel-Lambert, Cayli J; Schultz, John J; Sigman, Michael E
2015-11-01
Identification of osseous materials is generally established on gross anatomical features. However, highly fragmented or taphonomically altered materials may be problematic and may require chemical analysis. This research was designed to assess the use of scanning electron microscopy-energy-dispersive X-ray spectrometry (SEM/EDX), elemental analysis, and multivariate statistical analysis (principal component analysis) for discrimination of osseous and nonosseous materials of similar chemical composition. Sixty samples consisting of osseous (human and nonhuman bone and dental) and non-osseous samples were assessed. After outliers were removed a high overall correct classification of 97.97% was achieved, with 99.86% correct classification for osseous materials. In addition, a blind study was conducted using 20 samples to assess the applicability for using this method to classify unknown materials. All of the blind study samples were correctly classified resulting in 100% correct classification, further demonstrating the efficiency of SEM/EDX and statistical analysis for differentiation of osseous and nonosseous materials. © 2015 American Academy of Forensic Sciences.
Siepak, Marcin; Sojka, Mariusz
2017-08-01
The paper reports the results of measurements of trace elements concentrations in surface water samples collected at the lowland retention reservoirs of Stare Miasto and Kowalskie (Poland). The samples were collected once a month from October 2011 to November 2012. Al, As, Cd, Co, Cr, Cu, Li, Mn, Ni, Pb, Sb, V, and Zn were determined in water samples using the inductively coupled plasma with mass detection (ICP-QQQ). To assess the chemical composition of surface water, multivariate statistical methods of data analysis were used, viz. cluster analysis (CA), principal components analysis (PCA), and discriminant analysis (DA). They made it possible to observe similarities and differences in the chemical composition of water in the points of water samples collection, to uncover hidden factors accounting for the structure of the data, and to assess the impact of natural and anthropogenic sources on the content of trace elements in the water of retention reservoirs. The conducted statistical analyses made it possible to distinguish groups of trace elements allowing for the analysis of time and spatial variation of water in the studied reservoirs.
Dexter, Alex; Race, Alan M; Styles, Iain B; Bunch, Josephine
2016-11-15
Spatial clustering is a powerful tool in mass spectrometry imaging (MSI) and has been demonstrated to be capable of differentiating tumor types, visualizing intratumor heterogeneity, and segmenting anatomical structures. Several clustering methods have been applied to mass spectrometry imaging data, but a principled comparison and evaluation of different clustering techniques presents a significant challenge. We propose that testing whether the data has a multivariate normal distribution within clusters can be used to evaluate the performance when using algorithms that assume normality in the data, such as k-means clustering. In cases where clustering has been performed using the cosine distance, conversion of the data to polar coordinates prior to normality testing should be performed to ensure normality is tested in the correct coordinate system. In addition to these evaluations of internal consistency, we demonstrate that the multivariate normal distribution can then be used as a basis for statistical modeling of MSI data. This allows the generation of synthetic MSI data sets with known ground truth, providing a means of external clustering evaluation. To demonstrate this, reference data from seven anatomical regions of an MSI image of a coronal section of mouse brain were modeled. From this, a set of synthetic data based on this model was generated. Results of r(2) fitting of the chi-squared quantile-quantile plots on the seven anatomical regions confirmed that the data acquired from each spatial region was found to be closer to normally distributed in polar space than in Euclidean. Finally, principal component analysis was applied to a single data set that included synthetic and real data. No significant differences were found between the two data types, indicating the suitability of these methods for generating realistic synthetic data.
Chen, Liang; Tokuda, N
2002-01-01
By exploiting the Fourier series expansion, we have developed a new constructive method of automatically generating a multivariable fuzzy inference system from any given sample set with the resulting multivariable function being constructed within any specified precision to the original sample set. The given sample sets are first decomposed into a cluster of simpler sample sets such that a single input fuzzy system is constructed readily for a sample set extracted directly from the cluster independent of the other variables. Once the relevant fuzzy rules and membership functions are constructed for each of the variables completely independent of the other variables, the resulting decomposed fuzzy rules and membership functions are integrated back into the fuzzy system appropriate for the original sample set requiring only a moderate cost of computation in the required decomposition and composition processes. After proving two basic theorems which we need to ensure the validity of the decomposition and composition processes of the system construction, we have demonstrated a constructive algorithm of a multivariable fuzzy system. Exploiting an implicit error bound analysis available at each of the construction steps, the present Fourier method is capable of implementing a more stable fuzzy system than the power series expansion method of ParNeuFuz and PolyNeuFuz, covering and implementing a wider range of more robust applications.
Local Strategy Combined with a Wavelength Selection Method for Multivariate Calibration
Directory of Open Access Journals (Sweden)
Haitao Chang
2016-06-01
Full Text Available One of the essential factors influencing the prediction accuracy of multivariate calibration models is the quality of the calibration data. A local regression strategy, together with a wavelength selection approach, is proposed to build the multivariate calibration models based on partial least squares regression. The local algorithm is applied to create a calibration set of spectra similar to the spectrum of an unknown sample; the synthetic degree of grey relation coefficient is used to evaluate the similarity. A wavelength selection method based on simple-to-use interactive self-modeling mixture analysis minimizes the influence of noisy variables, and the most informative variables of the most similar samples are selected to build the multivariate calibration model based on partial least squares regression. To validate the performance of the proposed method, ultraviolet-visible absorbance spectra of mixed solutions of food coloring analytes in a concentration range of 20–200 µg/mL is measured. Experimental results show that the proposed method can not only enhance the prediction accuracy of the calibration model, but also greatly reduce its complexity.
Development of a Research Methods and Statistics Concept Inventory
Veilleux, Jennifer C.; Chapman, Kate M.
2017-01-01
Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…
Leary, James F.; McLaughlin, Scott R.; Reece, Lisa M.; Rosenblatt, Judah I.; Hokanson, James A.
1999-06-01
Multivariate statistics can be used for visualization of cell subpopulations in multidimensional data space and for classification of cells within that data space. New data mining techniques we have developed, such as subtractive clustering, can be used to find the differences between test and control multiparameter flow cytometric data, e.g. in the problem of human stem cell isolation with tumor purging. They also can provide training data for subsequent multivariate statistical classification techniques such as discriminant function or logistic regression analyses. Using lookup tables, these multivariate statistical calculations can be performed in real-time, and can even include probabilities of misclassification. Thus, the only distinction between off-line classification of cells in data analysis and real-time statistical decision-making for cell sorting is the time limit in which a classification decision must be made. For real-time cell sorting we presently are able to perform these classifications in less than 625 microseconds, corresponding to the time that it takes the cell to travel from the laser intersection point to the sort decision point in a flow cytometer/cell sorter. Statistical decision making and the ability to include the costs of misclassification into that decision process will become important as flow cytometry/cell sorting moves from diagnostics to therapeutics.
Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom
2016-04-01
Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants.
METHODS TO RESTRUCTURE THE STATISTICAL COMMUNITIES
Directory of Open Access Journals (Sweden)
Emilia TITAN
2005-12-01
Full Text Available In view of knowing the essence of phenomena it is necessary to perform statistical data processing operations. This allows for shifting from individual data to derived, synthetic indicators that highlight the essence of various phenomena. The high volume and diversity of processing operations presuppose developing plans of computerised data processing. To identify distinct and homogenous groups and classes it is necessary to realise well-pondered groupings and classifications that presuppose to comply with the requirements presented in the article.
Statistical models and methods for reliability and survival analysis
Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo
2013-01-01
Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical
Energy Technology Data Exchange (ETDEWEB)
Perlman, M D
1976-03-01
Efficient methods for approximating percentage points of the largest characteristic root of a Wishart matrix, and other statistical quantities of interest, were developed. Fitting of non-additive models to two-way and higher-way tables and the further development of the SNAP statistical computing system were reported. Numerical procedures for computing boundary-crossing probabilities for Brownian motion and other stochastic processes, such as Bessel diffusions, were implemented. Mathematical techniques from statistical mechanics were applied to obtain a unified treatment of probabilities of large deviations of the sample; in the setting of general topological vector spaces. The application of the Martin boundary to questions about infinite particle systems was studied. A comparative study of classical ''omnibus'' and Bayes procedures for combining several independent noncentral chi-square test statistics was completed. Work proceeds on the related problem of combining noncentral F-tests. A numerical study of the small-sample powers of the Pearson chi-square and likelihood ratio tests for multinomial goodness-of-fit was made. The relationship between asymptotic (large sample) efficiency of test statistics, as measured by Bahadur's concept of exact slope, and actual small-sample efficiency was studied. A promising new technique for the simultaneous estimation of all correlation coefficients in a multivariate population was developed. The method adapts the James--Stein ''shrinking'' estimator (for location parameters) to the estimating of correlations.
Schmolke, S. R.; Broeg, K.; Zander, S.; Bissinger, V.; Hansen, P. D.; Kress, N.; Herut, B.; Jantzen, E.; Krüner, G.; Sturm, A.; Körting, W.; von Westernhagen, H.
A comprehensive database, containing biological and chemical information, collected in the framework of the bilateral interdisciplinary MARS project (''biological indicators of natural and man-made changes in marine and coastal waters'') during the years 1995-1997 in the coastal environment of the North Sea, was subjected to a multivariate statistical evaluation. The MARS project was designated to combine a variety of approaches and to develop a set of methods for the employment of biological indicators in pollution monitoring and environmental quality assessment. In total, nine ship cruises to four coastal sampling sites were conducted; 765 fish and 384 mussel samples were analysed for biological and chemical parameters. Additional information on the chemical background at the sampling sites was derived from sediment samples, collected at each of the four sampling sites. Based on the available chemical data in sediments and black mussel (Mytilus edulis) a pollution gradient between the selected sites, was established. The chemical body burden of flounder (Platichthys flesus) from these sites, though, did not reflect this gradient equally clear. In contrast, the biological information derived from measurements in fish samples displayed significant a regional as well as a temporal pattern. A multivariate bioindicator data matrix was evaluated employing a factor analysis model to identify relations between selected biological indicators, and to improve the understanding of a regional and temporal component in the parameter response. In a second approach, applying the k-means algorithm on the data matrix, two significantly different clusters of samples, characterised by the current health status of the fish, were extracted. Using this classification a temporal, and in the second order, a less pronounced spatial effect was evident. In particular, during July 1996, a clear sign of deteriorating environmental conditions was extracted from the biological data matrix.
Cabrieto, Jedelyn; Tuerlinckx, Francis; Kuppens, Peter; Grassmann, Mariel; Ceulemans, Eva
2017-06-01
Change point detection in multivariate time series is a complex task since next to the mean, the correlation structure of the monitored variables may also alter when change occurs. DeCon was recently developed to detect such changes in mean and\\or correlation by combining a moving windows approach and robust PCA. However, in the literature, several other methods have been proposed that employ other non-parametric tools: E-divisive, Multirank, and KCP. Since these methods use different statistical approaches, two issues need to be tackled. First, applied researchers may find it hard to appraise the differences between the methods. Second, a direct comparison of the relative performance of all these methods for capturing change points signaling correlation changes is still lacking. Therefore, we present the basic principles behind DeCon, E-divisive, Multirank, and KCP and the corresponding algorithms, to make them more accessible to readers. We further compared their performance through extensive simulations using the settings of Bulteel et al. (Biological Psychology, 98 (1), 29-42, 2014) implying changes in mean and in correlation structure and those of Matteson and James (Journal of the American Statistical Association, 109 (505), 334-345, 2014) implying different numbers of (noise) variables. KCP emerged as the best method in almost all settings. However, in case of more than two noise variables, only DeCon performed adequately in detecting correlation changes.
Wang, Longfei; Lee, Sungyoung; Gim, Jungsoo; Qiao, Dandi; Cho, Michael; Elston, Robert C; Silverman, Edwin K; Won, Sungho
2016-09-01
Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease. The software of mFARVAT is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows.
Directory of Open Access Journals (Sweden)
Armin Saed-Moucheshi
2013-01-01
Full Text Available Multivariate statistical techniques were used to compare the relationship between yield and its related traits under noninoculated and inoculated cultivars with mycorrhizal fungus (Glomus intraradices; each one consisted of three wheat cultivars and four water regimes. Results showed that, under inoculation conditions, spike weight per plant and total chlorophyll content of the flag leaf were the most important variables contributing to wheat grain yield variation, while, under noninoculated condition, in addition to two mentioned traits, grain weight per spike and leaf area were also important variables accounting for wheat grain yield variation. Therefore, spike weight per plant and chlorophyll content of flag leaf can be used as selection criteria in breeding programs for both inoculated and noninoculated wheat cultivars under different water regimes, and also grain weight per spike and leaf area can be considered for noninoculated condition. Furthermore, inoculation of wheat cultivars showed higher value in the most measured traits, and the results indicated that inoculation treatment could change the relationship among morphological traits of wheat cultivars under drought stress. Also, it seems that the results of stepwise regression as a selecting method together with principal component and factor analysis are stronger methods to be applied in breeding programs for screening important traits.
Wu, Wei; Sun, Le; Zhang, Zhe; Guo, Yingying; Liu, Shuying
2015-03-25
An ultra-high-performance liquid chromatography coupled with quadrupole-time-of-flight mass spectrometry (UHPLC-Q-TOF-MS) method was developed for the detection and structural analysis of ginsenosides in white ginseng and related processed products (red ginseng). Original neutral, malonyl, and chemically transformed ginsenosides were identified in white and red ginseng samples. The aglycone types of ginsenosides were determined by MS/MS as PPD (m/z 459), PPT (m/z 475), C-24, -25 hydrated-PPD or PPT (m/z 477 or m/z 493), and Δ20(21)-or Δ20(22)-dehydrated-PPD or PPT (m/z 441 or m/z 457). Following the structural determination, the UHPLC-Q-TOF-MS-based chemical profiling coupled with multivariate statistical analysis method was applied for global analysis of white and processed ginseng samples. The chemical markers present between the processed products red ginseng and white ginseng could be assigned. Process-mediated chemical changes were recognized as the hydrolysis of ginsenosides with large molecular weight, chemical transformations of ginsenosides, changes in malonyl-ginsenosides, and generation of 20-(R)-ginsenoside enantiomers. The relative contents of compounds classified as PPD, PPT, malonyl, and transformed ginsenosides were calculated based on peak areas in ginseng before and after processing. This study provides possibility to monitor multiple components for the quality control and global evaluation of ginseng products during processing. Copyright © 2014 Elsevier B.V. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Bakraji, E.H., E-mail: cscientificl@aec.org.sy [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Rihawy, M.S. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic); Castel, C. [CNRS – Maison de l’Orient et de la Méditerranée, Laboratoire “Archéorient”, CNRS/Université Lumière-Lyon 2 (France); Abboud, R. [Archaeometry Laboratory, Chemistry Department, Atomic Energy Commission of Syria, P. O. Box 6091, Damascus (Syrian Arab Republic)
2015-03-15
Highlights: •PIXE and OSL methods were used to classify and date pottery from Tell Al-Rawda site. •Three groups were classified using PIXE, which suggest different sources of the clay. •OSL was used for dating the site and the date found was consistent with typology. -- Abstract: Particle Induced X-ray Emission (PIXE) technique has been utilised to study 48 Syrian ancient pottery fragments taken from excavations at Tell Al-Rawda site. Eighteen elements (Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Ni, Zn, As, Br, Rb, Sr, Y, and Pb) were determined. The elements concentrations have been processed using two multivariate statistical methods, to classify the pottery where one main group and other two small groups were defined. In addition, four samples from different places on the site were subjected to optically stimulated luminescence (OSL) dating. The average age obtained using a single aliquot regeneration (SAR) protocol was found to be 4350 ± 240 year.
Statistical methods of estimating mining costs
Long, K.R.
2011-01-01
Until it was defunded in 1995, the U.S. Bureau of Mines maintained a Cost Estimating System (CES) for prefeasibility-type economic evaluations of mineral deposits and estimating costs at producing and non-producing mines. This system had a significant role in mineral resource assessments to estimate costs of developing and operating known mineral deposits and predicted undiscovered deposits. For legal reasons, the U.S. Geological Survey cannot update and maintain CES. Instead, statistical tools are under development to estimate mining costs from basic properties of mineral deposits such as tonnage, grade, mineralogy, depth, strip ratio, distance from infrastructure, rock strength, and work index. The first step was to reestimate "Taylor's Rule" which relates operating rate to available ore tonnage. The second step was to estimate statistical models of capital and operating costs for open pit porphyry copper mines with flotation concentrators. For a sample of 27 proposed porphyry copper projects, capital costs can be estimated from three variables: mineral processing rate, strip ratio, and distance from nearest railroad before mine construction began. Of all the variables tested, operating costs were found to be significantly correlated only with strip ratio.
DEFF Research Database (Denmark)
Birch, Thomas; Martinón-Torres, Marcos
2015-01-01
An assemblage of post-medieval iron bars was found with the Princes Channel wreck, salvaged from the Thames Estuary in 2003. They were recorded and studied, with a focus on metallography and slag inclusion analysis. The investigation provided an opportunity to explore the use of multivariate stat...
Innovative statistical methods for public health data
Wilson, Jeffrey
2015-01-01
The book brings together experts working in public health and multi-disciplinary areas to present recent issues in statistical methodological development and their applications. This timely book will impact model development and data analyses of public health research across a wide spectrum of analysis. Data and software used in the studies are available for the reader to replicate the models and outcomes. The fifteen chapters range in focus from techniques for dealing with missing data with Bayesian estimation, health surveillance and population definition and implications in applied latent class analysis, to multiple comparison and meta-analysis in public health data. Researchers in biomedical and public health research will find this book to be a useful reference, and it can be used in graduate level classes.
Methods of contemporary mathematical statistical physics
2009-01-01
This volume presents a collection of courses introducing the reader to the recent progress with attention being paid to laying solid grounds and developing various basic tools. An introductory chapter on lattice spin models is useful as a background for other lectures of the collection. The topics include new results on phase transitions for gradient lattice models (with introduction to the techniques of the reflection positivity), stochastic geometry reformulation of classical and quantum Ising models, the localization/delocalization transition for directed polymers. A general rigorous framework for theory of metastability is presented and particular applications in the context of Glauber and Kawasaki dynamics of lattice models are discussed. A pedagogical account of several recently discussed topics in nonequilibrium statistical mechanics with an emphasis on general principles is followed by a discussion of kinetically constrained spin models that are reflecting important peculiar features of glassy dynamic...
Mathematical and statistical methods for multistatic imaging
Ammari, Habib; Jing, Wenjia; Kang, Hyeonbae; Lim, Mikyoung; Sølna, Knut; Wang, Han
2013-01-01
This book covers recent mathematical, numerical, and statistical approaches for multistatic imaging of targets with waves at single or multiple frequencies. The waves can be acoustic, elastic or electromagnetic. They are generated by point sources on a transmitter array and measured on a receiver array. An important problem in multistatic imaging is to quantify and understand the trade-offs between data size, computational complexity, signal-to-noise ratio, and resolution. Another fundamental problem is to have a shape representation well suited to solving target imaging problems from multistatic data. In this book the trade-off between resolution and stability when the data are noisy is addressed. Efficient imaging algorithms are provided and their resolution and stability with respect to noise in the measurements analyzed. It also shows that high-order polarization tensors provide an accurate representation of the target. Moreover, a dictionary-matching technique based on new invariants for the generalized ...
Institute of Scientific and Technical Information of China (English)
盛婷婷; 张俊丽
2011-01-01
在本文中，主要在{0}函数中讨论了含二个卷积核对偶型奇异积分方程可解性Noether定理与相应的可解条件，在相应可解条件满足时，给出了一般解的显式．%This paper, based on the courses teaching of multivariate statistical analysis of statistical professional in our school, attempts to discuss teaching philosophy, teaching mode and teaching evaluation and puts forward some teaching methods, including analogy induction, case teaching and multiple test teaching.
Institute of Scientific and Technical Information of China (English)
刘银萍; 安丽微
2011-01-01
This paper, based on the courses teaching of multivariate statistical analysis of statistical professional in our school, attempts to discuss teaching philosophy, teaching mode and teaching evaluation and puts forward some teaching methods, including analogy induction, case teaching and multiple test teaching.%本文针对我校统计学专业多元统计分析课程的教学，从教学理念、教学模式及教学评价几个方面进行了探讨，提出了类比归纳、案例教学及多元测试教学法．
Directory of Open Access Journals (Sweden)
Cláudio Roberto Rosário
2012-07-01
Full Text Available The purpose of this research is to improve the practice on customer satisfaction analysis The article presents an analysis model to analyze the answers of a customer satisfaction evaluation in a systematic way with the aid of multivariate statistical techniques, specifically, exploratory analysis with PCA – Partial Components Analysis with HCA - Hierarchical Cluster Analysis. It was tried to evaluate the applicability of the model to be used by the issue company as a tool to assist itself on identifying the value chain perceived by the customer when applied the questionnaire of customer satisfaction. It was found with the assistance of multivariate statistical analysis that it was observed similar behavior among customers. It also allowed the company to conduct reviews on questions of the questionnaires, using analysis of the degree of correlation between the questions that was not a company’s practice before this research.
Energy Technology Data Exchange (ETDEWEB)
Heyen, H. [GKSS-Forschungszentrum Geesthacht GmbH (Germany). Inst. fuer Gewaesserphysik
1998-12-31
A multivariate statistical approach is presented that allows a systematic search for relationships between the interannual variability in climate records and ecological time series. Statistical models are built between climatological predictor fields and the variables of interest. Relationships are sought on different temporal scales and for different seasons and time lags. The possibilities and limitations of this approach are discussed in four case studies dealing with salinity in the German Bight, abundance of zooplankton at Helgoland Roads, macrofauna communities off Norderney and the arrival of migratory birds on Helgoland. (orig.) [Deutsch] Ein statistisches, multivariates Modell wird vorgestellt, das eine systematische Suche nach potentiellen Zusammenhaengen zwischen Variabilitaet in Klima- und oekologischen Zeitserien erlaubt. Anhand von vier Anwendungsbeispielen wird der Klimaeinfluss auf den Salzgehalt in der Deutschen Bucht, Zooplankton vor Helgoland, Makrofauna vor Norderney, und die Ankunft von Zugvoegeln auf Helgoland untersucht. (orig.)
广义次序统计量间隔的多维随机排序%Multivariate Stochastic Orderings of Spacings of Generalized Order Statistics
Institute of Scientific and Technical Information of China (English)
方兆本; 胡太忠; 吴耀华; 庄玮玮
2006-01-01
本文研究了附加于广义次序统计量底分布以及参数的条件,使得人们在多维似然比序和多维通常随机序意义下对广义次序统计量的间隔向量进行比较,同时也给出了文中主要结果的应用.%In this paper, we investigate conditions on the underlying distribution function and the parameters on which the generalized order statistics are based, to obtain stochastic comparisons of spacing vectors of generalized order statistics in the multivariate likelihood ratio and the usual multivariate stochastic orders. Some applications of the main results are also given.
Zhu, Guangxu; Guo, Qingjun; Xiao, Huayun; Chen, Tongbin; Yang, Jun
2017-06-01
Heavy metals are considered toxic to humans and ecosystems. In the present study, heavy metal concentration in soil was investigated using the single pollution index (PIi), the integrated Nemerow pollution index (PIN), and the geoaccumulation index (Igeo) to determine metal accumulation and its pollution status at the abandoned site of the Capital Iron and Steel Factory in Beijing and its surrounding area. Multivariate statistical (principal component analysis and correlation analysis), geostatistical analysis (ArcGIS tool), combined with stable Pb isotopic ratios, were applied to explore the characteristics of heavy metal pollution and the possible sources of pollutants. The results indicated that heavy metal elements show different degrees of accumulation in the study area, the observed trend of the enrichment factors, and the geoaccumulation index was Hg > Cd > Zn > Cr > Pb > Cu ≈ As > Ni. Hg, Cd, Zn, and Cr were the dominant elements that influenced soil quality in the study area. The Nemerow index method indicated that all of the heavy metals caused serious pollution except Ni. Multivariate statistical analysis indicated that Cd, Zn, Cu, and Pb show obvious correlation and have higher loads on the same principal component, suggesting that they had the same sources, which are related to industrial activities and vehicle emissions. The spatial distribution maps based on ordinary kriging showed that high concentrations of heavy metals were located in the local factory area and in the southeast-northwest part of the study region, corresponding with the predominant wind directions. Analyses of lead isotopes confirmed that Pb in the study soils is predominantly derived from three Pb sources: dust generated during steel production, coal combustion, and the natural background. Moreover, the ternary mixture model based on lead isotope analysis indicates that lead in the study soils originates mainly from anthropogenic sources, which contribute much more
A kernel-based multivariate feature selection method for microarray data classification.
Directory of Open Access Journals (Sweden)
Shiquan Sun
Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.
Statistical methods for categorical data analysis
Powers, Daniel
2008-01-01
This book provides a comprehensive introduction to methods and models for categorical data analysis and their applications in social science research. Companion website also available, at https://webspace.utexas.edu/dpowers/www/
Simple statistical methods for software engineering data and patterns
Pandian, C Ravindranath
2015-01-01
Although there are countless books on statistics, few are dedicated to the application of statistical methods to software engineering. Simple Statistical Methods for Software Engineering: Data and Patterns fills that void. Instead of delving into overly complex statistics, the book details simpler solutions that are just as effective and connect with the intuition of problem solvers.Sharing valuable insights into software engineering problems and solutions, the book not only explains the required statistical methods, but also provides many examples, review questions, and case studies that prov
A PID de-tuned method for multivariable systems, applied for HVAC plant
Ghazali, A. B.
2015-09-01
A simple yet effective de-tuning of PID parameters for multivariable applications has been described. Although the method is felt to have wider application it is simulated in a 3-input/ 2-output building energy management system (BEMS) with known plant dynamics. The controller performances such as the sum output squared error and total energy consumption when the system is at steady state conditions are studied. This tuning methodology can also be extended to reduce the number of PID controllers as well as the control inputs for specified output references that are necessary for effective results, i.e. with good regulation performances being maintained.
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Institute of Scientific and Technical Information of China (English)
Rong Ma; Jiansheng Shi; Jichao Liu; Chunlei Gui
2014-01-01
Understanding the controlling factor of groundwater quality can enhance promoting sustaina-ble development of groundwater resources. To this end, multivariate statistical analysis (MA) and hydrochemical analysis were introduced in this work. The results indicate that the canonical discriminant function with 7 parameters was established using the discriminant analysis (DA) method, which can afford 100%correct assignation according to the 3 different clusters (good water (GW), poor water (PW), and very poor water (VPW)) obtained from cluster analysis (CA). According to factor analysis (FA), 8 factors were ex-tracted from 25 hydrochemical elements and account for 80.897%of the total data variance, suggesting that groundwater with higher concentrations of sodium, calcium, magnesium, chloride, and sulfate in southeastern study area are mainly affected by the natural process;the higher level of arsenic and chromium in ground-water extracted from northwestern part of study area are derived by industrial activities;domestic and agri-culture sewage have important contribution to copper, iron, iodine, and phosphate in the northern study area. Therefore, this work can help identify the main controlling factor of groundwater quality in North China plain so as to make better and more informed decisions about how to achieve groundwater resources sustain-able development.
Statistical methods for analysing complex genetic traits
El Galta, Rachid
2006-01-01
Complex traits are caused by multiple genetic and environmental factors, and are therefore difficult to study compared with simple Mendelian diseases. The modes of inheritance of Mendelian diseases are often known. Methods to dissect such diseases are well described in literature. For complex geneti
RF Calibration of On-Chip DfT Chain by DC Stimuli and Statistical Multivariate Regression Technique
Ramzan, Rashad; Dabrowski, Jerzy
2015-01-01
The problem of parameter variability in RF and analog circuits is escalating with CMOS scaling. Consequently every RF chip produced in nano-meter CMOS technologies needs to be tested. On-chip Design for Testability (DfT) features, which are meant to reduce test time and cost also suffer from parameter variability. Therefore, RF calibration of all on-chip test structures is mandatory. In this paper, Artificial Neural Networks (ANN) are employed as a multivariate regression technique to archite...
Analysis of Statistical Methods Currently used in Toxicology Journals
Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min
2014-01-01
Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and in...
Classical Methods of Statistics With Applications in Fusion-Oriented Plasma Physics
Kardaun, Otto J W F
2005-01-01
Classical Methods of Statistics is a blend of theory and practical statistical methods written for graduate students and researchers interested in applications to plasma physics and its experimental aspects. It can also fruitfully be used by students majoring in probability theory and statistics. In the first part, the mathematical framework and some of the history of the subject are described. Many exercises help readers to understand the underlying concepts. In the second part, two case studies are presented exemplifying discriminant analysis and multivariate profile analysis. The introductions of these case studies outline contextual magnetic plasma fusion research. In the third part, an overview of statistical software is given and, in particular, SAS and S-PLUS are discussed. In the last chapter, several datasets with guided exercises, predominantly from the ASDEX Upgrade tokamak, are included and their physical background is concisely described. The book concludes with a list of essential keyword transl...
Problems and Recommendations for Rural Statistics and Survey Methods
Institute of Scientific and Technical Information of China (English)
Chengjun; ZHANG
2014-01-01
With constant deepening of the reform and opening-up,national economic system has changed from planned economy to market economy,and rural survey and statistics remain in a difficult transition period. In this period,China needs transforming original statistical mode according to market economic system. All levels of government should report and submit a lot and increasing statistical information. Besides,in this period,townships,villages and counties are faced with old and new conflicts. These conflicts perplex implementation of rural statistics and survey and development of rural statistical undertaking,and also cause researches and thinking of reform of rural statistical and survey methods.
Statistical Methods Used in Gifted Education Journals, 2006-2010
Warne, Russell T.; Lazo, Maria; Ramos, Tammy; Ritter, Nicola
2012-01-01
This article describes the statistical methods used in quantitative and mixed methods articles between 2006 and 2010 in five gifted education research journals. Results indicate that the most commonly used statistical methods are means (85.9% of articles), standard deviations (77.8%), Pearson's "r" (47.8%), X[superscript 2] (32.2%), ANOVA (30.7%),…
Statistical methods for assessment of blend homogeneity
DEFF Research Database (Denmark)
Madsen, Camilla
2002-01-01
as powder blends there is no natural unit or amount to define a sample from the blend, and partly that current technology does not provide a method of universally collecting small representative samples from large static powder beds. In the thesis a number of methods to assess (in)homogeneity are presented...... of internal factors to the blend e.g. the particle size distribution. The relation between particle size distribution and the variation in drug content in blend and tablet samples is discussed. A central problem is to develop acceptance criteria for blends and tablet batches to decide whether the blend...... blend or batch. In the thesis it is shown how to link sampling result and acceptance criteria to the actual quality (homogeneity) of the blend or tablet batch. Also it is discussed how the assurance related to a specific acceptance criteria can be obtained from the corresponding OC-curve. Further...
Statistical methods for handling incomplete data
Kim, Jae Kwang
2013-01-01
""… this book nicely blends the theoretical material and its application through examples, and will be of interest to students and researchers as a textbook or a reference book. Extensive coverage of recent advances in handling missing data provides resources and guidelines for researchers and practitioners in implementing the methods in new settings. … I plan to use this as a textbook for my teaching and highly recommend it.""-Biometrics, September 2014
Arvanitoyannis, Ioannis S; Vaitsi, Olga B
2007-01-01
Authenticity and traceability have been two of the most important issues in the food chain. Authenticity in particular, is closely related with both food quality and safety issues. Vegetables stand for a category of foods heavily affected by adulteration either in terms of geographic origin (national or international level) or production methods (organic or conventional production, fertilizers, pesticides, genetically modified vegetables). This review aims at addressing most of the currently applied methods for ensuring quality control of vegetables; a) instrumental: ion chromatography, high pressure liquid chromatography, atomic absorption spectrophotometry, electronic nose and mass spectroscopy and b) sensory analysis. The results of all the above mentioned methods were analyzed by means of multivariate analysis (principal component analysis, discriminant analysis, cluster analysis, canonical analysis, and factor analysis). All ensuing results and conclusions are summarized in eight comprehensive tables.
Alba, Vittorio; Bergamini, Carlo; Genghi, Rosalinda; Gasparro, Marica; Perniola, Rocco; Antonacci, Donato
2015-08-01
High estimated heritability values were recently revealed for mature leaf traits in grape (Vitis vinifera L.), thus redeeming ampelography in the era of molecular markers. The "Organisation Internationale de la Vigne et du Vin (OIV)" set a list of hundreds of descriptors for grapevine in order to standardize ampelographic and ampelometric scores. Therefore, the selection and reduction of the number of OIV codes can represent a major goal for leaner biodiversity assessment studies. The identification of ampelometric traits associated with grape diversity allows to construct Classification Trees with chi squared automatic interaction detection (CHAID) algorithm, a stepwise model-fitting method that produces a tree diagram in which at each step the sample pool is splitted based on the independent variables statistically different for the dependent variable. A collection of 100 table and wine grapevines (Vitis vinifera L.) was characterized and evaluated by means of six microsatellites and twenty-two ampelometric traits on mature leaves. Nine ampelometric traits were selected by principal component analysis and employed to build the classification trees based on CHAID algorithm. The strategy can represent an effective tool for grape biodiversity management, right allocations, and identification of new grape genotypes, implemented by a further microsatellite investigation only when unsolved cases occur, allowing faster and cheaper results.
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…
Multivariable modeling and multivariate analysis for the behavioral sciences
Everitt, Brian S
2009-01-01
Multivariable Modeling and Multivariate Analysis for the Behavioral Sciences shows students how to apply statistical methods to behavioral science data in a sensible manner. Assuming some familiarity with introductory statistics, the book analyzes a host of real-world data to provide useful answers to real-life issues.The author begins by exploring the types and design of behavioral studies. He also explains how models are used in the analysis of data. After describing graphical methods, such as scatterplot matrices, the text covers simple linear regression, locally weighted regression, multip
Chudoba, R.; Sadílek, V.; Rypl, R.; Vořechovský, M.
2013-02-01
This paper examines the feasibility of high-level Python based utilities for numerically intensive applications via an example of a multidimensional integration for the evaluation of the statistical characteristics of a random variable. We discuss the approaches to the implementation of mathematically formulated incremental expressions using high-level scripting code and low-level compiled code. Due to the dynamic typing of the Python language, components of the algorithm can be easily coded in a generic way as algorithmic templates. Using the Enthought Development Suite they can be effectively assembled into a flexible computational framework that can be configured to execute the code for arbitrary combinations of integration schemes and versions of instantiated code. The paper describes the development cycle using a simple running example involving averaging of a random two-parametric function that includes discontinuity. This example is also used to compare the performance of the available algorithmic and executional features. The implemented package including further examples and the results of performance studies have been made available via the free repository [1] and CPCP library. Program summaryProgram title: spirrid Catalogue identifier: AENL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AENL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special licence provided by the author No. of lines in distributed program, including test data, etc.: 10722 No. of bytes in distributed program, including test data, etc.: 157099 Distribution format: tar.gz Programming language: Python and C. Computer: PC. Operating system: LINUX, UNIX, Windows. Classification: 4.13, 6.2. External routines: NumPy (http://numpy.scipy.org/), SciPy (http://www.scipy.com) Nature of problem: Evaluation of the statistical moments of a function of random variables. Solution method: Direct multidimensional
Statistical Method of Estimating Nigerian Hydrocarbon Reserves
Directory of Open Access Journals (Sweden)
Jeffrey O. Oseh
2015-01-01
Full Text Available Hydrocarbon reserves are basic to planning and investment decisions in Petroleum Industry. Therefore its proper estimation is of considerable importance in oil and gas production. The estimation of hydrocarbon reserves in the Niger Delta Region of Nigeria has been very popular, and very successful, in the Nigerian oil and gas industry for the past 50 years. In order to fully estimate the hydrocarbon potentials in Nigerian Niger Delta Region, a clear understanding of the reserve geology and production history should be acknowledged. Reserves estimation of most fields is often performed through Material Balance and Volumetric methods. Alternatively a simple Estimation Model and Least Squares Regression may be useful or appropriate. This model is based on extrapolation of additional reserve due to exploratory drilling trend and the additional reserve factor which is due to revision of the existing fields. This Estimation model used alongside with Linear Regression Analysis in this study gives improved estimates of the fields considered, hence can be used in other Nigerian Fields with recent production history
A step beyond the Monte Carlo method in economics: Application of multivariate normal distribution
Kabaivanov, S.; Malechkova, A.; Marchev, A.; Milev, M.; Markovska, V.; Nikolova, K.
2015-11-01
In this paper we discuss the numerical algorithm of Milev-Tagliani [25] used for pricing of discrete double barrier options. The problem can be reduced to accurate valuation of an n-dimensional path integral with probability density function of a multivariate normal distribution. The efficient solution of this problem with the Milev-Tagliani algorithm is a step beyond the classical application of Monte Carlo for option pricing. We explore continuous and discrete monitoring of asset path pricing, compare the error of frequently applied quantitative methods such as the Monte Carlo method and finally analyze the accuracy of the Milev-Tagliani algorithm by presenting the profound research and important results of Honga, S. Leeb and T. Li [16].
Song, Biao; Lu, Dan; Peng, Ming; Li, Xia; Zou, Ye; Huang, Meizhen; Lu, Feng
2017-02-01
Raman spectroscopy is developed as a fast and non-destructive method for the discrimination and classification of hydroxypropyl methyl cellulose (HPMC) samples. 44 E series and 41 K series of HPMC samples are measured by a self-developed portable Raman spectrometer (Hx-Raman) which is excited by a 785 nm diode laser and the spectrum range is 200-2700 cm-1 with a resolution (FWHM) of 6 cm-1. Multivariate analysis is applied for discrimination of E series from K series. By methods of principal components analysis (PCA) and Fisher discriminant analysis (FDA), a discrimination result with sensitivity of 90.91% and specificity of 95.12% is achieved. The corresponding receiver operating characteristic (ROC) is 0.99, indicting the accuracy of the predictive model. This result demonstrates the prospect of portable Raman spectrometer for rapid, non-destructive classification and discrimination of E series and K series samples of HPMC.
Yun, Yong-Huan; Li, Hong-Dong; Wood, Leslie R. E.; Fan, Wei; Wang, Jia-Jun; Cao, Dong-Sheng; Xu, Qing-Song; Liang, Yi-Zeng
2013-07-01
Wavelength selection is a critical step for producing better prediction performance when applied to spectral data. Considering the fact that the vibrational and rotational spectra have continuous features of spectral bands, we propose a novel method of wavelength interval selection based on random frog, called interval random frog (iRF). To obtain all the possible continuous intervals, spectra are first divided into intervals by moving window of a fix width over the whole spectra. These overlapping intervals are ranked applying random frog coupled with PLS and the optimal ones are chosen. This method has been applied to two near-infrared spectral datasets displaying higher efficiency in wavelength interval selection than others. The source code of iRF can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.
MULTIVARIATE FOURIER TRANSFORM METHODS OVER SIMPLEX AND SUPER-SIMPLEX DOMAINS
Institute of Scientific and Technical Information of China (English)
Jiachang Sun
2006-01-01
In this paper we propose the well-known Fourier method on some non-tensor product domains in Rd, including simplex and so-called super-simplex which consists of (d + 1)! simplices. As two examples, in 2-D and 3-D case a super-simplex is shown as a paralle lhexagon and a parallel quadrilateral dodecahedron, respectively. We have extended most of concepts and results of the traditional Fourier methods on multivariate cases, such as Fourier basis system, Fourier series, discrete Fourier transform (DFT) and its fast algorithm(FFT) on the super-simplex, as well as generalized sine and cosine transforms (DST, DCT) and related fast algorithms over a simplex. The relationship between the basic orthogonal system and eigen-functions of a Laplacian-like operator over these domains is explored.
UV-vis absorption spectroscopy and multivariate analysis as a method to discriminate tequila
Barbosa-García, O.; Ramos-Ortíz, G.; Maldonado, J. L.; Pichardo-Molina, J. L.; Meneses-Nava, M. A.; Landgrave, J. E. A.; Cervantes-Martínez, J.
2007-01-01
Based on the UV-vis absorption spectra of commercially bottled tequilas, and with the aid of multivariate analysis, it is proved that different brands of white tequila can be identified from such spectra, and that 100% agave and mixed tequilas can be discriminated as well. Our study was done with 60 tequilas, 58 of them purchased at liquor stores in various Mexican cities, and two directly acquired from a distillery. All the tequilas were of the "white" type, that is, no aged spirits were considered. For the purposes of discrimination and quality control of tequilas, the spectroscopic method that we present here offers an attractive alternative to the traditional methods, like gas chromatography, which is expensive and time-consuming.
Jang, Cheng-Shin
2013-05-01
Multivariate geostatistical approaches have been applied extensively in characterizing risks and uncertainty of pollutant concentrations exceeding anthropogenic regulatory limits. Spatially delineating an extent of contamination potential is considerably critical for regional groundwater resources protection and utilization. This study used multivariate indicator kriging (MVIK) to determine spatial patterns of contamination extents in groundwater for irrigation and made a predicted comparison between two types of MVIK, including MVIK of multiplying indicator variables (MVIK-M) and of averaging indicator variables (MVIK-A). A cross-validation procedure was adopted to examine the performance of predicted errors, and various probability thresholds used to calculate ratios of declared pollution area to total area were explored for the two MVIK methods. The assessed results reveal that the northern and central aquifers have excellent groundwater quality for irrigation use. Results obtained through a cross-validation procedure indicate that MVIK-M is more robust than MVIK-A. Furthermore, a low ratio of declared pollution area to total area in MVIK-A may result in an unrealistic and unreliable probability used to determine extents of pollutants. Therefore, this study suggests using MVIK-M to probabilistically determine extents of pollutants in groundwater.
Approaches to sample size determination for multivariate data
Saccenti, Edoardo; Timmerman, Marieke E.
2016-01-01
Sample size determination is a fundamental step in the design of experiments. Methods for sample size determination are abundant for univariate analysis methods, but scarce in the multivariate case. Omics data are multivariate in nature and are commonly investigated using multivariate statistical
Scientific Method, Statistical Method and the Speed of Light
MacKay, R. J.; Oldford, R.W.
2000-01-01
What is “statistical method”? Is it the same as “scientific method”? This paper answers the first question by specifying the elements and procedures common to all statistical investigations and organizing these into a single structure. This structure is illustrated by careful examination of the first scientific study on the speed of light carried out by A. A. Michelson in 1879. Our answer to the second question is negative. To understand this a history on the speed of light ...
An Overview of Short-term Statistical Forecasting Methods
DEFF Research Database (Denmark)
Elias, Russell J.; Montgomery, Douglas C.; Kulahci, Murat
2006-01-01
An overview of statistical forecasting methodology is given, focusing on techniques appropriate to short- and medium-term forecasts. Topics include basic definitions and terminology, smoothing methods, ARIMA models, regression methods, dynamic regression models, and transfer functions. Techniques...
An Overview of Short-term Statistical Forecasting Methods
DEFF Research Database (Denmark)
Elias, Russell J.; Montgomery, Douglas C.; Kulahci, Murat
2006-01-01
An overview of statistical forecasting methodology is given, focusing on techniques appropriate to short- and medium-term forecasts. Topics include basic definitions and terminology, smoothing methods, ARIMA models, regression methods, dynamic regression models, and transfer functions. Techniques...
Online Statistics Labs in MSW Research Methods Courses: Reducing Reluctance toward Statistics
Elliott, William; Choi, Eunhee; Friedline, Terri
2013-01-01
This article presents results from an evaluation of an online statistics lab as part of a foundations research methods course for master's-level social work students. The article discusses factors that contribute to an environment in social work that fosters attitudes of reluctance toward learning and teaching statistics in research methods…
Online Statistics Labs in MSW Research Methods Courses: Reducing Reluctance toward Statistics
Elliott, William; Choi, Eunhee; Friedline, Terri
2013-01-01
This article presents results from an evaluation of an online statistics lab as part of a foundations research methods course for master's-level social work students. The article discusses factors that contribute to an environment in social work that fosters attitudes of reluctance toward learning and teaching statistics in research methods…
Yuan, Ke-Hai
2008-01-01
In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the noncentral chi-square distribution is justified by statistical theory. Actually, when the null hypothesis is not trivially violated, the noncentral chi-square distribution cannot describe the LR statistic well even when data are normally distributed and the sample size is large. Using the one-dimensional case, this article provides the details showing that the LR statistic asymptotically follows a normal distribution, which also leads to an asymptotically correct confidence interval for the discrepancy between the null hypothesis/model and the population. For each one-dimensional result, the corresponding results in the higher dimensional case are pointed out and references are provided. Examples with real data illustrate the difference between the noncentral chi-square distribution and the normal distribution. Monte Carlo results compare the strength of the normal distribution against that of the noncentral chi-square distribution. The implication to data analysis is discussed whenever relevant. The development is built upon the concepts of basic calculous, linear algebra, and introductory probability and statistics. The aim is to provide the least technical material for quantitative graduate students in social science to understand the condition and limitation of the noncentral chi-square distribution.
The use of Statistical Methods in Mechanical Engineering
Directory of Open Access Journals (Sweden)
Iram Saleem
2013-03-01
Full Text Available Statistics is an important tool to handle the vast data of present era as statistics can interpret all the information in such a beauty that so many conclusions can be extracted from it. The aim of this study is to see the use of statistical methods in Mechanical Engineering (ME therefore; we selected research papers published in 2010 from the well reputed journals in ME under Taylor and Francis Company LTD. More than 350 research papers were downloaded from well reputed ME journals such as Inverse Problem in Science and Engineering (IPSE, Machining Science and Technology (MST, Materials and Manufacturing Processes (MMP, Particulate Science and Technology (PST and Research in Nondestructive Evaluation (RNE. We recorded the statistical techniques/methods used in each research paper. In this study, we presented frequency distribution of descriptive statistics and advance level statistical methods used in five of the ME journals in 2010.
Multivariate meta-analysis: potential and promise.
Jackson, Dan; Riley, Richard; White, Ian R
2011-09-10
The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day 'Multivariate meta-analysis' event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd.
A multivariate based event detection method and performance comparison with two baseline methods.
Liu, Shuming; Smith, Kate; Che, Han
2015-09-01
Early warning systems have been widely deployed to protect water systems from accidental and intentional contamination events. Conventional detection algorithms are often criticized for having high false positive rates and low true positive rates. This mainly stems from the inability of these methods to determine whether variation in sensor measurements is caused by equipment noise or the presence of contamination. This paper presents a new detection method that identifies the existence of contamination by comparing Euclidean distances of correlation indicators, which are derived from the correlation coefficients of multiple water quality sensors. The performance of the proposed method was evaluated using data from a contaminant injection experiment and compared with two baseline detection methods. The results show that the proposed method can differentiate between fluctuations caused by equipment noise and those due to the presence of contamination. It yielded higher possibility of detection and a lower false alarm rate than the two baseline methods. With optimized parameter values, the proposed method can correctly detect 95% of all contamination events with a 2% false alarm rate.
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Achtemeier, Gary L.; Ochs, Harry T., III
1988-01-01
The variational method of undetermined multipliers is used to derive a multivariate model for objective analysis. The model is intended for the assimilation of 3-D fields of rawinsonde height, temperature and wind, and mean level temperature observed by satellite into a dynamically consistent data set. Relative measurement errors are taken into account. The dynamic equations are the two nonlinear horizontal momentum equations, the hydrostatic equation, and an integrated continuity equation. The model Euler-Lagrange equations are eleven linear and/or nonlinear partial differential and/or algebraic equations. A cyclical solution sequence is described. Other model features include a nonlinear terrain-following vertical coordinate that eliminates truncation error in the pressure gradient terms of the horizontal momentum equations and easily accommodates satellite observed mean layer temperatures in the middle and upper troposphere. A projection of the pressure gradient onto equivalent pressure surfaces removes most of the adverse impacts of the lower coordinate surface on the variational adjustment.
A Bayesian design space for analytical methods based on multivariate models and predictions.
Lebrun, Pierre; Boulanger, Bruno; Debrus, Benjamin; Lambert, Philippe; Hubert, Philippe
2013-01-01
The International Conference for Harmonization (ICH) has released regulatory guidelines for pharmaceutical development. In the document ICH Q8, the design space of a process is presented as the set of factor settings providing satisfactory results. However, ICH Q8 does not propose any practical methodology to define, derive, and compute design space. In parallel, in the last decades, it has been observed that the diversity and the quality of analytical methods have evolved exponentially, allowing substantial gains in selectivity and sensitivity. However, there is still a lack of a rationale toward the development of robust separation methods in a systematic way. Applying ICH Q8 to analytical methods provides a methodology for predicting a region of the space of factors in which results will be reliable. Combining design of experiments and Bayesian standard multivariate regression, an identified form of the predictive distribution of a new response vector has been identified and used, under noninformative as well as informative prior distributions of the parameters. From the responses and their predictive distribution, various critical quality attributes can be easily derived. This Bayesian framework was then extended to the multicriteria setting to estimate the predictive probability that several critical quality attributes will be jointly achieved in the future use of an analytical method. An example based on a high-performance liquid chromatography (HPLC) method is given. For this example, a constrained sampling scheme was applied to ensure the modeled responses have desirable properties.
Study on Teaching the Course of Multivariate Statistics in Geochemistry%“地球化学多元统计分析”课程教学探讨
Institute of Scientific and Technical Information of China (English)
龚庆杰
2012-01-01
＂地球化学多元统计分析＂课程是地球化学专业本科生的主干课程,旨在培养学生利用多元统计分析的方法解决实际地球化学科研问题。本文在阐述中国地质大学（北京）＂地球化学多元统计分析＂课程发展历史的基础上,分析了该课程的教学内容及其逻辑关系。教学方法可采用基本原理讲解、实例软件演示、课堂演讲解决科研问题相结合。实例应用可以培养学生独立解决地球化学科研问题的能力。%Multivariate statistics in geochemistry is an important course for bachelors majored on geochemistry. This course aims using multivariate statistical methods at resolving scientific problems in geochemistry and earth sciences for students. The development history of this course in China University of Geosciences was depicted. The course contents were discussed and arranged logically for an easy learning. The teaching methods over the past years were presented and discussed. The combination of basic principle analysis, software presentation on calculation steps, and scientific lecture on geochemical problem solving is a useful teaching method. The ability of students to solve geochemical problems on case studies. on multivariate statistical methods can be enhanced clearly
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Understanding the process of the changing phytoplankton patterns can be particularly useful in water quality improvement and management decisions. However, it is generally not easy to illustrate the interactions between phytoplankton biomass and related environmental variables given their high spatial and temporal heterogeneity. To elucidate relationships between them in a eutrophic shallow lake, Taihu Lake, relative long-term data set of biotic and abiotic parameters of water quality in the lake were conducted using multivariate statistical analysis within seasonal periodicity. The results indicate that water temperature and total phosphorus (TP) played governing roles in phytoplankton dynamics in most seasons (i.e. temperature in winter, spring and summer; TP in spring, summer and autumn); COD (chemical oxygen demand) and BOD (biological oxygen demand) presented significant positive relationships with phytoplankton biomass in spring, summer and autumn. However, a complex interplay was found between phytoplankton biomass and nitrogen considering significant positive relationships occurring between them in spring and autumn, and conversely negative ones in summer. As the predatory factor, zooplankton presented significant grazing-pressure on phytoplankton biomass during summer in view of negative relationship between them in the season. Significant feedback effects of phytoplankton development were identified in summer and autumn in view that significant relationships were observed between phytoplankton biomass and pH, Trans (transparency of water) and DO. The results indicate that interactions between phytoplankton biomass and related environmental variables are highly sensitive to seasonal periodicity, which improves understanding of different roles of biotic and abiotic variables upon phytoplankton variability, and hence, advances management methods for eutrophic lakes.
Lodola, Alessio; Sirirak, Jitnapa; Fey, Natalie; Rivara, Silvia; Mor, Marco; Mulholland, Adrian J
2010-09-14
The effects of structural fluctuations, due to protein dynamics, on enzyme activity are at the heart of current debates on enzyme catalysis. There is evidence that fatty acid amide hydrolase (FAAH) is an enzyme for which reaction proceeds via a high-energy, reactive conformation, distinct from the predominant enzyme-substrate complex (Lodola et al. Biophys. J. 2007, 92, L20-22). Identifying the structural causes of differences in reactivity between conformations in such complex systems is not trivial. Here, we show that multivariate analysis of key structural parameters can identify structural determinants of barrier height by analysis of multiple reaction paths. We apply a well-tested quantum mechanics/molecular mechanics (QM/MM) method to the first step of the acylation reaction between FAAH and oleamide substrate for 36 different starting structures. Geometrical parameters (consisting of the key bond distances that change during the reaction) were collected and used for principal component analysis (PCA), partial least-squares (PLS) regression analysis, and multiple linear regression (MLR) analysis. PCA indicates that different "families" of enzyme-substrate conformations arise from QM/MM molecular dynamics simulation and that rarely sampled, catalytically significant conformational states can be identified. PLS and MLR analyses allowed the construction of linear regression models, correlating the calculated activation barriers with simple geometrical descriptors. These analyses reveal the presence of two fully independent geometrical effects, explaining 78% of the variation in the activation barrier, which are directly correlated with transition-state stabilization (playing a major role in catalysis) and substrate binding. These results highlight the power of statistical approaches of this type in identifying crucial structural features that contribute to enzyme reactivity.
Statistical methods in longitudinal research principles and structuring change
von Eye, Alexander
1991-01-01
These edited volumes present new statistical methods in a way that bridges the gap between theoretical and applied statistics. The volumes cover general problems and issues and more specific topics concerning the structuring of change, the analysis of time series, and the analysis of categorical longitudinal data. The book targets students of development and change in a variety of fields - psychology, sociology, anthropology, education, medicine, psychiatry, economics, behavioural sciences, developmental psychology, ecology, plant physiology, and biometry - with basic training in statistics an
Institute of Scientific and Technical Information of China (English)
梁军; 钱积新
2003-01-01
Multivariate statistical process monitoring and control (MSPM&C) methods for chemical process monitoring with statistical projection techniques such as principal component analysis (PCA) and partial least squares (PLS) are surveyed in this paper. The four-step procedure of performing MSPM&C for chemical process, modeling of processes, detecting abnormal events or faults, identifying the variable(s) responsible for the faults and diagnosing the source cause for the abnormal behavior, is analyzed. Several main research directions of MSPM&C reported in the literature are discussed, such as multi-way principal component analysis (MPCA) for batch process, statistical monitoring and control for nonlinear process, dynamic PCA and dynamic PLS, and on-line quality control by inferential models. Industrial applications of MSPM&C to several typical chemical processes, such as chemical reactor, distillation column, polymerization process, petroleum refinery units, are summarized. Finally, some concluding remarks and future considerations are made.
Energy Technology Data Exchange (ETDEWEB)
Aguado Garcia, D.; Ferrer Riquelme, A. J.; Seco Torrecillas, A.; Ferrer Polo, J.
2006-07-01
Due to the increasingly stringent effluents quality requirements imposed by the regulations, monitoring wastewater treatment plants (WWTP) becomes extremely important in order to achieve efficient process operations. Nowadays, at modern WWTP large number of online process variables are collected and these variable are usually highly correlated. Therefore, appropriate techniques are required to extract the information from the huge amount of collected data. In this work, the application of multivariate statistical projection techniques is presented as an effective strategy for monitoring a sequencing batch reactor (SBR) operated for enhanced biological phosphorus removal. (Author)
Energy Technology Data Exchange (ETDEWEB)
Alves, Luana F.N.; Sarkis, Jorge E.S.; Bordon, Isabela C.A.C., E-mail: ludemar1@hotmail.com, E-mail: jesarkis@ipen.br, E-mail: isabella.bordon@gmail.com [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)
2015-07-01
Analysis of industrial lubricants is widely used for monitoring and predicting maintenance requirements in a broad range of mechanical systems. Laser induced breakdown spectroscopy has been used to evaluate the potentiality of the technique for the determination of metals in lubricating oils. Prior to quantitative analysis, the LIBS system was calibrated using standard samples containing the elements investigated (Cu, Cr, Fe, Pb, Mo and Mg). This study presents the usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets in order to get more information about concentration of metals in oils lubricants is related to engine wear. (author)
Directory of Open Access Journals (Sweden)
Amin Hossein Morshedy
2017-07-01
Full Text Available Introduction Nowadays, exploration of rare earth element (REE resources is considered as one of the strategic priorities, which has a special position in the advanced and intelligent industries (Castor and Hedrick, 2006. Significant resources of REEs are found in a wide range of geological settings, including primary deposits associated with igneous and hydrothermal processes (e.g. carbonatite, (per alkaline-igneous rocks, iron-oxide breccia complexes, scarns, fluorapatite veins and pegmatites, and secondary deposits concentrated by sedimentary processes and weathering (e.g. heavy-mineral sand deposits, fluviatile sandstones, unconformity-related uranium deposits, and lignites (Jaireth et al., 2014. Recent studies on various parts of Iran led to the identification of promising potential of these elements, including Central Iran, alkaline rocks in the Eslami Peninsula, iron and apatite in the Hormuz Island, Kahnouj titanium deposit, granitoid bodies in Yazd, Azerbaijan, and Mashhad and associated dikes, and finally placers related to the Shemshak formation in Marvast, Kharanagh, and Ardekan indicate high concentration of REE in magmatogenic iron–apatite deposits in Central Iran and placers in Marvast area in Yazd (Ghorbani, 2013. Materials and methods In the present study, the geochemical behavior of rare earth elements is modeled by using multivariate statistical methods in the eastern part of the Marvast placer. Marvast is located 185 km south of the city of Yazd in central Iran between Yazd and Mehriz. This area lies within the southeastern part of the Sanandaj-Sirjan Zone (Alipour-Asll et al., 2012. The samples of 53 wells were analyzed for Whole-rock trace-element concentrations (including REE by inductively coupled plasma-mass spectrometry (ICP-MS (GSI, 2004. The clustering techniques such as multivariate statistical analysis technique can be employed to find appropriate groups in data sets. One of the main objectives of data clustering
Herojeet, Rajkumar; Rishi, Madhuri S.; Lata, Renu; Dolma, Konchok
2017-09-01
Sirsa River flows through the central part of the Nalagarh valley, belongs to the rapid industrial belt of Baddi, Barotiwala and Nalagarh (BBN). The appraisal of surface water quality to ascertain its utility in such ecologically sensitive areas is need of the hour. The present study envisages the application of multivariate analysis, water utility class and conventional graphical representation to reveal the hidden factor responsible for deterioration of water quality and determine the hydrochemical facies and its evolution processes of water types in Nalagarh valley, India. The quality assessment is made by estimating pH, electrical conductivity (EC), total dissolved solids (TDS), total hardness, major ions (Na+, K+, Ca2+, Mg2+, HCO3 -, Cl-, SO4 2-, NO3 - and PO4 3-), dissolved oxygen (DO), biological oxygen demand (BOD) and total coliform (TC) to determine its suitability for drinking and domestic purposes. The parameters like pH, TDS, TH, Ca2+, HCO3 -, Cl-, SO4 2-, NO3 - are within the desirable limit as per Bureau of Indian Standards (Indian Standard Drinking Water Specification (Second Edition) IS:10500. Indian Standard Institute, New Delhi, pp 1-18, 2012). Mg2+, Na+ and K+ ions for pre monsoon and EC during pre and post monsoon at few sites and approx 40% samples of BOD and TC for both seasons exceeds the permissible limits indicate organic contamination from human activities. Water quality classification for designated use indicates that maximum surface water samples are not suitable for drinking water source without conventional treatment. The result of piper trillinear and Chadha's diagram classified majority of surface water samples for both seasons fall in the fields of Ca2+-Mg2+-HCO3 - water type indicating temporary hardness. PCA and CA reveal that the surface water chemistry is influenced by natural factors such as weathering of minerals, ion exchange processes and anthropogenic factors. Thus, the present paper illustrates the importance of
The estimation of the measurement results with using statistical methods
Velychko, O.; Gordiyenko, T.
2015-02-01
The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.
Experimental Data Mining Techniques(Using Multiple Statistical Methods
Directory of Open Access Journals (Sweden)
Mustafa Zaidi
2012-05-01
Full Text Available This paper discusses the possible solutions of non-linear multivariable by experimental Data mining techniques using on orthogonal array. Taguchi method is a very useful technique to reduce the time and cost of the experiment but the ignoring all kind of interaction effects. The results are not much encouraging and motivate to study Laser cutting process of non-linear multivariable is modeled by one and two way analysis of variance also linear and non linear regression analysis. These techniques are used to explore better analysis techniques and improve the laser cutting quality by reducing process variations caused by controllable process parameters. The size of data set causes difficulties in modeling and simulation of the problem such as decision tree is useful technique but it is not able to predict better results. The results of analysis of variance are encouraging. Taguchi and regression normally optimizes input process parameters for single characteristics.
Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung
2016-01-01
A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy.
Giorgi, Gabriele; Teunissen, Peter J. G.; Verhagen, Sandra; Buist, Peter J.
2010-07-01
GNSS (Global Navigation Satellite Systems)-based attitude determination is an important field of study, since it is a valuable technique for the orientation estimation of remote sensing platforms. To achieve highly accurate angular estimates, the precise GNSS carrier phase observables must be employed. However, in order to take full advantage of the high precision, the unknown integer ambiguities of the carrier phase observables need to be resolved. This contribution presents a GNSS carrier phase-based attitude determination method that determines the integer ambiguities and attitude in an integral manner, thereby fully exploiting the known body geometry of the multi-antennae configuration. It is shown that this integral approach aids the ambiguity resolution process tremendously and strongly improves the capacity of fixing the correct set of integer ambiguities. In this contribution, the challenging scenario of single-epoch, single-frequency attitude determination is addressed. This guarantees a total independence from carrier phase slips and losses of lock, and it also does not require any a priori motion model for the platform. The method presented is a multivariate constrained version of the popular LAMBDA method and it is tested on data collected during an airborne remote sensing campaign.
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs.
An Alternative Method for Computing Mean and Covariance Matrix of Some Multivariate Distributions
Radhakrishnan, R.; Choudhury, Askar
2009-01-01
Computing the mean and covariance matrix of some multivariate distributions, in particular, multivariate normal distribution and Wishart distribution are considered in this article. It involves a matrix transformation of the normal random vector into a random vector whose components are independent normal random variables, and then integrating…
2015-06-01
talented people supported its production. I am deeply indebted to my advisor, Professor Robert Koyak, for his patience, support, generosity... stupidity , stubbornness, and stumbling progress. You laid down your abilities, ambitions, and accolades but retained your beauty. I say with sincerity
2011-01-01
Measures of agro-ecosystems genetic variability are essential to sustain scientific-based actions and policies tending to protect the ecosystem services they provide. To build the genetic variability datum it is necessary to deal with a large number and different types of variables. Molecular marker data is highly dimensional by nature, and frequently additional types of information are obtained, as morphological and physiological traits. This way, gene...
Directory of Open Access Journals (Sweden)
Abdullah S. Al-Farraj
2013-01-01
Full Text Available The aim of this research is to evaluate arsenic distribution and associated hydrogeochemical parameters in 27 randomly selected boreholes representing aquifers in the Al-Kharj geothermal fields of Saudi Arabia. Arsenic was detected at all sites, with 92.5% of boreholes yielding concentrations above the WHO permissible limit of 10 μg/L. The maximum concentration recorded was 122 μg/L (SD = 29 μg/L skewness = 1.87. The groundwater types were mainly Ca+2-Mg+2-SO4-2-Cl− and Na+-Cl−-SO4-2, accounting for 67% of the total composition. Principal component analysis (PCA showed that the main source of arsenic release was geothermal in nature and was linked to processes similar to those involved in the release of boron. The PCA yielded five components, which accounted for 44.1%, 17.0%, 10.1%, 08.4%, and 06.5% of the total variance. The first component had positive loadings for arsenic and boron along with other hydrogeochemical parameters, indicating the primary sources of As mobilization are derived from regional geothermal systems and weathering of minerals. The remaining principal components indicated reductive dissolution of iron oxyhydroxides as a possible mechanism. Spatial evaluation of the PCA results indicated that this secondary mechanism of arsenic mobilization may be active and correlates positively with total organic carbon. The aquifers were found to be contaminated to a high degree with organic carbon ranging from 0.57 mg/L to 21.42 mg/L and showed high concentrations of NO3- ranging from 8.05 mg/L to 248.2 mg/L.
Directory of Open Access Journals (Sweden)
Shashank Vyas
2016-01-01
Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.
Grade-Average Method: A Statistical Approach for Estimating ...
African Journals Online (AJOL)
Grade-Average Method: A Statistical Approach for Estimating Missing Value for Continuous Assessment Marks. ... Journal of the Nigerian Association of Mathematical Physics. Journal Home · ABOUT ... Open Access DOWNLOAD FULL TEXT ...
Methods of quantum field theory in statistical physics
Abrikosov, A A; Gorkov, L P; Silverman, Richard A
1975-01-01
This comprehensive introduction to the many-body theory was written by three renowned physicists and acclaimed by American Scientist as ""a classic text on field theoretic methods in statistical physics."
Steganalytic method based on short and repeated sequence distance statistics
Institute of Scientific and Technical Information of China (English)
WANG GuoXin; PING XiJian; XU ManKun; ZHANG Tao; BAO XiRui
2008-01-01
According to the distribution characteristics of short and repeated sequence (SRS),a steganalytic method based on the correlation of image bit planes is proposed.Firstly,we provide the conception of SRS distance statistics and deduce its statistical distribution.Because the SRS distance statistics can effectively reflect the correlation of the sequence,SRS has statistical features when the image bit plane sequence equals the image width.Using this characteristic,the steganalytic method is fulfilled by the distinct test of Poisson distribution.Experimental results show a good performance for detecting LSB matching steganographic method in still images.By the way,the proposed method is not designed for specific steganographic algorithms and has good generality.
Longitudinal data analysis a handbook of modern statistical methods
Fitzmaurice, Garrett; Verbeke, Geert; Molenberghs, Geert
2008-01-01
Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted challenges that arise in analyzing longitudinal data. After discussing historical aspects, leading researchers explore four broad themes: parametric modeling, nonparametric and semiparametric methods, joint
Energy Technology Data Exchange (ETDEWEB)
Papachristodoulou, Christina [Nuclear Physics Laboratory, Department of Physics, University of Ioannina, 451 10 Ioannina (Greece); Oikonomou, Artemios [Composite Materials Laboratory, Department of Materials' Science and Engineering, University of Ioannina, 451 10 Ioannina (Greece); Ioannides, Kostas [Nuclear Physics Laboratory, Department of Physics, University of Ioannina, 451 10 Ioannina (Greece); Gravani, Konstantina [Archaelogy Section, Department of History-Archaeology, University of Ioannina, 451 10 Ioannina (Greece)
2006-07-28
Energy-dispersive X-ray fluorescence spectroscopy was used to determine the composition of 64 potsherds from the Hellenistic settlement of Orraon, in northwestern Greece. Data classification by principal components analysis revealed four distinct groups of pottery, pointing to different local production practices rather than different provenance. The interpretation of statistical grouping was corroborated by a complementary X-ray diffraction analysis. Compositional and mineralogical data, combined with archaeological and materials' science criteria, allowed addressing various aspects of pottery making, such as selection of raw clays, tempers and firing conditions.
Papachristodoulou, Christina; Oikonomou, Artemios; Ioannides, Kostas; Gravani, Konstantina
2006-07-28
Energy-dispersive X-ray fluorescence spectroscopy was used to determine the composition of 64 potsherds from the Hellenistic settlement of Orraon, in northwestern Greece. Data classification by principal components analysis revealed four distinct groups of pottery, pointing to different local production practices rather than different provenance. The interpretation of statistical grouping was corroborated by a complementary X-ray diffraction analysis. Compositional and mineralogical data, combined with archaeological and materials' science criteria, allowed addressing various aspects of pottery making, such as selection of raw clays, tempers and firing conditions.
Multivariate optimization of a solar water heating system using the Simplex method
Michelson, E
1982-01-01
Two Simplex computer library packages for multivariate optimization have been tested on an hour-by-hour simulation of a solar water heating system. The two packages are: MINUITS written at CERN (Geneva) , and the E04CCF routine which is part of the UK Numerical Algorithms Group Library. Technical and economic optima have been derived for three of the following variables simultaneously: collector area, tilt, azimuth, and store volume. The two packages give the same results. The meteorological data used were one (composite) year for Hamburg (Germany) and 1964 for Kew (UK). The Hamburg data were also condensed to form a year consisting of 60 averaged days. The optima derived with the 60-day year were very close to those obtained with the 365-day year. The Simplex method, which is a direct search method, is known to be very robust. It is particularly suited to hour-by-hour simulations of solar heating systems since the function being minimized is not monotonically decreasing towards the minimum in sufficient sign...
Complex Data Modeling and Computationally Intensive Statistical Methods
Mantovan, Pietro
2010-01-01
The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
A Circular Statistical Method for Extracting Rotation Measures
Indian Academy of Sciences (India)
S. Sarala; Pankaj Jain
2002-03-01
We propose a new method for the extraction of Rotation Measures from spectral polarization data. The method is based on maximum likelihood analysis and takes into account the circular nature of the polarization data. The method is unbiased and statistically more efficient than the standard 2 procedure.
Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell
2012-01-01
Salvia divinorum is a plant material that is of forensic interest due to the hallucinogenic nature of the active ingredient, salvinorin A. In this study, S. divinorum was extracted and spiked onto four different plant materials (S. divinorum, Salvia officinalis, Cannabis sativa, and Nicotiana tabacum) to simulate an adulterated sample that might be encountered in a forensic laboratory. The adulterated samples were extracted and analyzed by gas chromatography-mass spectrometry, and the resulting total ion chromatograms were subjected to a series of pretreatment procedures that were used to minimize non-chemical sources of variance in the data set. The data were then analyzed using principal components analysis (PCA) to investigate association of the adulterated extracts to unadulterated S. divinorum. While association was possible based on visual assessment of the PCA scores plot, additional procedures including Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores to provide a statistical evaluation of the association observed. The advantages and limitations of each statistical procedure in a forensic context were compared and are presented herein.
Problem of mixtures with known compositions and IRONFLEA method for multivariate curve resolution.
Zyrianov, Yegor
2007-10-17
A special case of gray spectral data systems [(a) F.-T. Chau, Y.-Z. Liang, J. Gao, X.-G. Shao (Eds.), Chemometrics: From Basics to Wavelet Transform, Chemical Analysis Series, vol. 164, John Wiley & Sons, Inc., 2004; (b) Y.Z. Liang, O.M. Kvalheim, R. Manne, Chemom. Intell. Lab. Syst. 18 (1993) 235-250] is discussed here and the least-squares method for the multivariate curve resolution (MCR) named IRONFLEA is proposed. The system under consideration is the bilinear spectral data of the samples with known chemical compositions and unknown concentration matrix. If the spectra of samples (A(i)) and (Q+A(i)) (i = 1, ..., n, n > or = 2) are available, then the spectrum and the concentrations of Q could be found and the solution is unique. A practical chemical model for this problem could be mixtures, polymers, peptides, oligosaccharides, or supramolecular formations made of a limited number of monomeric components. In the cases of polymeric or oligomeric samples the spectral contributions and the concentrations of the particular monomeric units are extracted. The method is capable of extracting chemically meaningful spectra of components. The method is implemented in SAS IML code and tested for the deconvolution of spectra of polymers made of styrene derivatives with known monomeric compositions [(a) H. Fenniri, L. Ding, A.E. Ribbe, Y. Zyrianov, J. Am. Chem. Soc. 123 (2001) 8151-8152; (b) H. Fenniri, S. Chun, L. Ding, Y. Zyrianov, K. Hallenga, J. Am. Chem. Soc. 125 (2003) 10546-10560]. The method performs calculations fast enough to allow the incorporation of leave-one-out outlier removal procedure.
Statistical Methods for Single-Particle Electron Cryomicroscopy
DEFF Research Database (Denmark)
Jensen, Katrine Hommelhoff
from the noisy, randomly oriented projection images. Many statistical approaches to SPR have been proposed in the past. Typically, due to the computation time complexity, they rely on approximated maximum likelihood (ML) or maximum a posteriori (MAP) estimate of the structure. All methods presented...... between a MAP approach for estimating the protein structure. The resulting method is statistically optimal under the assumption of the uniform prior in the space of rotations. The marginal posterior is constructed by integrating over the view orientations and maximised by the expectation-maximisation (EM...... in this thesis attempt to solve a specific part of the reconstruction problem in a statistically sound manner. Firstly, we propose two methods for solving the problems (1) and (2). They can ultimately be extended and combined into a statistically sound solution to the full SPR problem. We use Bayesian...
[Evaluation of using statistical methods in selected national medical journals].
Sych, Z
1996-01-01
The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as
Analysis of Statistical Methods Currently used in Toxicology Journals.
Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min
2014-09-01
Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.
Chattopadhyay, Goutami; Jain, Rajni
2009-01-01
In this paper, the complexities in the relationship between rainfall and sea surface temperature (SST) anomalies during the winter monsoon (November-January) over India were evaluated statistically using scatter plot matrices and autocorrelation functions.Linear as well as polynomial trend equations were obtained and it was observed that the coefficient of determination for the linear trend was very low and it remained low even when polynomial trend of degree six was used. An exponential regression equation and an artificial neural network with extensive variable selection were generated to forecast the average winter monsoon rainfall of a given year using the rainfall amounts and the sea surface temperature anomalies in the winter monsoon months of the previous year as predictors. The regression coefficients for the multiple exponential regression equation were generated using Levenberg-Marquardt algorithm. The artificial neural network was generated in the form of a multiplayer perceptron with sigmoid non-l...
Jackson, Dan; Bujkiewicz, Sylwia; Law, Martin; Riley, Richard D; White, Ian R
2017-08-14
Random-effects meta-analyses are very commonly used in medical statistics. Recent methodological developments include multivariate (multiple outcomes) and network (multiple treatments) meta-analysis. Here, we provide a new model and corresponding estimation procedure for multivariate network meta-analysis, so that multiple outcomes and treatments can be included in a single analysis. Our new multivariate model is a direct extension of a univariate model for network meta-analysis that has recently been proposed. We allow two types of unknown variance parameters in our model, which represent between-study heterogeneity and inconsistency. Inconsistency arises when different forms of direct and indirect evidence are not in agreement, even having taken between-study heterogeneity into account. However, the consistency assumption is often assumed in practice and so we also explain how to fit a reduced model which makes this assumption. Our estimation method extends several other commonly used methods for meta-analysis, including the method proposed by DerSimonian and Laird (). We investigate the use of our proposed methods in the context of both a simulation study and a real example. © 2017, The Authors Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society.
ASSESSING CONVERGENCE OF THE MARKOV CHAIN MONTE CARLO METHOD IN MULTIVARIATE CASE
Directory of Open Access Journals (Sweden)
Daniel Furtado Ferreira
2012-01-01
Full Text Available The formal convergence diagnosis of the Markov Chain Monte Carlo (MCMC is made using univariate and multivariate criteria. In 1998, a multivariate extension of the univariate criterion of multiple sequences was proposed. However, due to some problems of that multivariate criterion, an alternative form of calculation was proposed in addition to the two new alternatives for multivariate convergence criteria. In this study, two models were used, one related to time series with two interventions and ARMA (2, 2 error and another related to a trivariate normal distribution, considering three different cases for the covariance matrix. In both the cases, the Gibbs sampler and the proposed criteria to monitor the convergence were used. Results revealed the proposed criteria to be adequate, besides being easy to implement.
Directory of Open Access Journals (Sweden)
Pao-Shin Chu
2007-01-01
Full Text Available In this study, a multivariate linear regression model is applied to predict the seasonal tropical cyclone (TC count in the vicinity of Taiwan using large-scale climate variables available from the preceding May. Here the season encompasses the five-month period from June through October, when typhoons are most active in the study domain. The model is based on the least absolute deviation so that regression estimates are more resistant (i.e., not unduly influenced by outliers than those derived from the ordinary least square method. Through lagged correlation analysis, five parameters (sea surface temperature, sea level pressure, precipitable water, low-level relative vorticity, and vertical wind shear in key locations of the tropical western North Pacific are identified as predictor datasets. Results from crossvalidation suggest that the statistical model is skillful in predicting TC activity, with a correlation coefficient of 0.63 for 1970 - 2003. If more recent data are included, the correlation coefficient reaches 0.69 for 1970 - 2006. Relative importance of each predictor variable is evaluated. For predicting higher than normal seasonal TC activity, warmer sea surface temperatures, a moist troposphere, and the presence of a low-level cyclonic circulation coupled with low-latitude westerlies in the Philippine Sea in the antecedent May appear to be important.
Pérez-Magariño, S; Ortega-Heras, M; González-San José, M L; Boger, Z
2004-04-19
Classical multivariate analysis techniques such as factor analysis and stepwise linear discriminant analysis and artificial neural networks method (ANN) have been applied to the classification of Spanish denomination of origin (DO) rose wines according to their geographical origin. Seventy commercial rose wines from four different Spanish DO (Ribera del Duero, Rioja, Valdepeñas and La Mancha) and two successive vintages were studied. Nineteen different variables were measured in these wines. The stepwise linear discriminant analyses (SLDA) model selected 10 variables obtaining a global percentage of correct classification of 98.8% and of global prediction of 97.3%. The ANN model selected seven variables, five of which were also selected by the SLDA model, and it gave a 100% of correct classification for training and prediction. So, both models can be considered satisfactory and acceptable, being the selected variables useful to classify and differentiate these wines by their origin. Furthermore, the casual index analysis gave information that can be easily explained from an enological point of view.
Selection of mango rosa genotypes in a breeding population using the multivariate-biplot method
Directory of Open Access Journals (Sweden)
Maria Clideana Cabral Maia
Full Text Available ABSTRACT: Mango ( Mangifera indica L. trees stand out among the main fruit trees cultivated in Brazil. The mango rosa fruit is a very popular local variety (landrace, especially because of their superior technological characteristics such as high contents of Vitamin C and soluble solids (SS, as well as attractive taste and color. The objective of this study was to select a breeding population of mango rosa (polyclonal variety; ≥5 individuals that can simultaneously meet the fresh and processed fruit markets, using the multivariate method of principal components and the biplot graphic. The principal components, biplot graphic, and phenotype correlations were obtained using the R (2012 software. Pulp percentage and the pulp, skin, and seed mass variables can be indirectly selected using the smallest fruit diameter, which allowed an easier measurement. The P23R AREA3, P30R AREA3, and P32R AREA3 genotypes are selection candidates due to the presence of alleles, which are important agro-technological traits for mango breeding. This study showed that the biplot analysis is a valuable tool for decision making and visualization of interrelationships between variables and genotypes, facilitating the mango selection process.
Oxygen Abundance Methods in SDSS: View from Modern Statistics
Indian Academy of Sciences (India)
Fei Shi; Gang Zhao; James Wicker
2010-09-01
Our purpose is to find which is the most reliable one among various oxygen abundance determination methods. We will test the validity of several different oxygen abundance determination methods using methods of modern statistics. These methods include Bayesian analysis and information scoring. We will analyze a sample of ∼ 6000 HII galaxies from the Sloan Digital Sky Survey (SDSS) spectroscopic observations data release four. All methods that we used drew the same conclusion that the method is a more reliable oxygen abundance determination method than the Bayesian metallicity method under the existing telescope ability. The ratios of the likelihoods between the different kinds of methods tell us that the , , and 32 methods are consistent with each other because the and 32 methods are calibrated by method. The Bayesian and 23 methods are consistent with each other because both are calibrated by a galaxy model. In either case, the 2 method is an unreliable method.
Karunathilaka, Sanjeewa R; Kia, Ali-Reza Fardin; Srigley, Cynthia; Chung, Jin Kyu; Mossoba, Magdi M
2016-10-01
A rapid tool for evaluating authenticity was developed and applied to the screening of extra virgin olive oil (EVOO) retail products by using Fourier-transform near infrared (FT-NIR) spectroscopy in combination with univariate and multivariate data analysis methods. Using disposable glass tubes, spectra for 62 reference EVOO, 10 edible oil adulterants, 20 blends consisting of EVOO spiked with adulterants, 88 retail EVOO products and other test samples were rapidly measured in the transmission mode without any sample preparation. The univariate conformity index (CI) and the multivariate supervised soft independent modeling of class analogy (SIMCA) classification tool were used to analyze the various olive oil products which were tested for authenticity against a library of reference EVOO. Better discrimination between the authentic EVOO and some commercial EVOO products was observed with SIMCA than with CI analysis. Approximately 61% of all EVOO commercial products were flagged by SIMCA analysis, suggesting that further analysis be performed to identify quality issues and/or potential adulterants. Due to its simplicity and speed, FT-NIR spectroscopy in combination with multivariate data analysis can be used as a complementary tool to conventional official methods of analysis to rapidly flag EVOO products that may not belong to the class of authentic EVOO. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.
2017-08-01
Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically
El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz
2017-10-01
Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock-water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. Copyright © 2017 Elsevier Ltd. All rights reserved.
Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth
2015-10-01
Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations
Brief guidelines for methods and statistics in medical research
Ab Rahman, Jamalludin
2015-01-01
This book serves as a practical guide to methods and statistics in medical research. It includes step-by-step instructions on using SPSS software for statistical analysis, as well as relevant examples to help those readers who are new to research in health and medical fields. Simple texts and diagrams are provided to help explain the concepts covered, and print screens for the statistical steps and the SPSS outputs are provided, together with interpretations and examples of how to report on findings. Brief Guidelines for Methods and Statistics in Medical Research offers a valuable quick reference guide for healthcare students and practitioners conducting research in health related fields, written in an accessible style.
Statistical Methods for Characterizing Variability in Stellar Spectra
Cisewski, Jessi; Yale Astrostatistics
2017-01-01
Recent years have seen a proliferation in the number of exoplanets discovered. One technique for uncovering exoplanets relies on the detection of subtle shifts in the stellar spectra due to the Doppler effect caused by an orbiting object. However, stellar activity can cause distortions in the spectra that mimic the imprint of an orbiting exoplanet. The collection of stellar spectra potentially contains more information than is traditionally used for estimating its radial velocity curve. I will discuss some statistical methods that can be used for characterizing the sources of variability in the spectra. Statistical assessment of stellar spectra is a focus of the Statistical and Applied Mathematical Sciences Institute (SAMSI)'s yearlong program on Statistical, Mathematical and Computational Methods for Astronomy's Working Group IV (Astrophysical Populations).
Fundamentals of modern statistical methods substantially improving power and accuracy
Wilcox, Rand R
2001-01-01
Conventional statistical methods have a very serious flaw They routinely miss differences among groups or associations among variables that are detected by more modern techniques - even under very small departures from normality Hundreds of journal articles have described the reasons standard techniques can be unsatisfactory, but simple, intuitive explanations are generally unavailable Improved methods have been derived, but they are far from obvious or intuitive based on the training most researchers receive Situations arise where even highly nonsignificant results become significant when analyzed with more modern methods Without assuming any prior training in statistics, Part I of this book describes basic statistical principles from a point of view that makes their shortcomings intuitive and easy to understand The emphasis is on verbal and graphical descriptions of concepts Part II describes modern methods that address the problems covered in Part I Using data from actual studies, many examples are include...
Complexity of software trustworthiness and its dynamical statistical analysis methods
Institute of Scientific and Technical Information of China (English)
ZHENG ZhiMing; MA ShiLong; LI Wei; JIANG Xin; WEI Wei; MA LiLi; TANG ShaoTing
2009-01-01
Developing trusted softwares has become an important trend and a natural choice in the development of software technology and applications.At present,the method of measurement and assessment of software trustworthiness cannot guarantee safe and reliable operations of software systems completely and effectively.Based on the dynamical system study,this paper interprets the characteristics of behaviors of software systems and the basic scientific problems of software trustworthiness complexity,analyzes the characteristics of complexity of software trustworthiness,and proposes to study the software trustworthiness measurement in terms of the complexity of software trustworthiness.Using the dynamical statistical analysis methods,the paper advances an invariant-measure based assessment method of software trustworthiness by statistical indices,and hereby provides a dynamical criterion for the untrustworthiness of software systems.By an example,the feasibility of the proposed dynamical statistical analysis method in software trustworthiness measurement is demonstrated using numerical simulations and theoretical analysis.
Statistical Methods for Quantitatively Detecting Fungal Disease from Fruits’ Images
Jagadeesh D. Pujari; Yakkundimath, Rajesh Siddaramayya; Byadgi, Abdulmunaf Syedhusain
2013-01-01
In this paper we have proposed statistical methods for detecting fungal disease and classifying based on disease severity levels. Most fruits diseases are caused by bacteria, fungi, virus, etc of which fungi are responsible for a large number of diseases in fruits. In this study images of fruits, affected by different fungal symptoms are collected and categorized based on disease severity. Statistical features like block wise, gray level co-occurrence matrix (GLCM), gray level runlength matr...
Kauer, Agnes; Dorigo, Wouter; Bauer-Marschallinger, Bernhard
2017-04-01
Global warming is expected to change ocean-atmosphere oscillation patterns, e.g. the El Nino Southern Oscillation, and may thus have a substantial impact on water resources over land. Yet, the link between climate oscillations and terrestrial hydrology has large uncertainties. In particular, the climate in the Mediterranean basin is expected to be sensitive to global warming as it may increase insufficient and irregular water supply and lead to more frequent and intense droughts and heavy precipitation events. The ever increasing need for water in tourism and agriculture reinforce the problem. Therefore, the monitoring and better understanding of the hydrological cycle are crucial for this area. This study seeks to quantify the effect of regional climate modes, e.g. the Northern Atlantic Oscillation (NAO) on the hydrological cycle in the Mediterranean. We apply Empirical Orthogonal Functions (EOF) to a wide range of hydrological datasets to extract the major modes of variation over the study period. We use more than ten datasets describing precipitation, soil moisture, evapotranspiration, and changes in water mass with study periods ranging from one to three decades depending on the dataset. The resulting EOFs are then examined for correlations with regional climate modes using Spearman rank correlation analysis. This is done for the entire time span of the EOFs and for monthly and seasonally sampled data. We find relationships between the hydrological datasets and the climate modes NAO, Arctic Oscillation (AO), Eastern Atlantic (EA), and Tropical Northern Atlantic (TNA). Analyses of monthly and seasonally sampled data reveal high correlations especially in the winter months. However, the spatial extent of the data cube considered for the analyses have a large impact on the results. Our statistical analyses suggest an impact of regional climate modes on the hydrological cycle in the Mediterranean area and may provide valuable input for evaluating process
Hierarchical modelling for the environmental sciences statistical methods and applications
Clark, James S
2006-01-01
New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.
The Metropolis Monte Carlo Method in Statistical Physics
Landau, David P.
2003-11-01
A brief overview is given of some of the advances in statistical physics that have been made using the Metropolis Monte Carlo method. By complementing theory and experiment, these have increased our understanding of phase transitions and other phenomena in condensed matter systems. A brief description of a new method, commonly known as "Wang-Landau sampling," will also be presented.
Descriptive and inferential statistical methods used in burns research.
Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars
2010-05-01
Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals
Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.
2017-05-01
The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for
Riley, P.; Richardson, I. G.
2012-01-01
In-situ measurements of interplanetary coronal mass ejections (ICMEs) display a wide range of properties. A distinct subset, "magnetic clouds" (MCs), are readily identifiable by a smooth rotation in an enhanced magnetic field, together with an unusually low solar wind proton temperature. In this study, we analyze Ulysses spacecraft measurements to systematically investigate five possible explanations for why some ICMEs are observed to be MCs and others are not: i) An observational selection effect; that is, all ICMEs do in fact contain MCs, but the trajectory of the spacecraft through the ICME determines whether the MC is actually encountered; ii) interactions of an erupting flux rope (PR) with itself or between neighboring FRs, which produce complex structures in which the coherent magnetic structure has been destroyed; iii) an evolutionary process, such as relaxation to a low plasma-beta state that leads to the formation of an MC; iv) the existence of two (or more) intrinsic initiation mechanisms, some of which produce MCs and some that do not; or v) MCs are just an easily identifiable limit in an otherwise corntinuous spectrum of structures. We apply quantitative statistical models to assess these ideas. In particular, we use the Akaike information criterion (AIC) to rank the candidate models and a Gaussian mixture model (GMM) to uncover any intrinsic clustering of the data. Using a logistic regression, we find that plasma-beta, CME width, and the ratio O(sup 7) / O(sup 6) are the most significant predictor variables for the presence of an MC. Moreover, the propensity for an event to be identified as an MC decreases with heliocentric distance. These results tend to refute ideas ii) and iii). GMM clustering analysis further identifies three distinct groups of ICMEs; two of which match (at the 86% level) with events independently identified as MCs, and a third that matches with non-MCs (68 % overlap), Thus, idea v) is not supported. Choosing between ideas i) and