Classification of Specialized Farms Applying Multivariate Statistical Methods
Directory of Open Access Journals (Sweden)
Zuzana Hloušková
2017-01-01
Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.
Statistical methods for segmentation and classification of images
DEFF Research Database (Denmark)
Rosholm, Anders
1997-01-01
The central matter of the present thesis is Bayesian statistical inference applied to classification of images. An initial review of Markov Random Fields relates to the modeling aspect of the indicated main subject. In that connection, emphasis is put on the relatively unknown sub-class of Pickard...... with a Pickard Random Field modeling of a considered (categorical) image phenomemon. An extension of the fast PRF based classification technique is presented. The modification introduces auto-correlation into the model of an involved noise process, which previously has been assumed independent. The suitability...... of the extended model is documented by tests on controlled image data containing auto-correlated noise....
Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.
2018-01-01
The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.
Statistical methods of discrimination and classification advances in theory and applications
Choi, Sung C
1986-01-01
Statistical Methods of Discrimination and Classification: Advances in Theory and Applications is a collection of papers that tackles the multivariate problems of discriminating and classifying subjects into exclusive population. The book presents 13 papers that cover that advancement in the statistical procedure of discriminating and classifying. The studies in the text primarily focus on various methods of discriminating and classifying variables, such as multiple discriminant analysis in the presence of mixed continuous and categorical data; choice of the smoothing parameter and efficiency o
Directory of Open Access Journals (Sweden)
Emmanouil Styvaktakis
2007-01-01
Full Text Available This paper presents the two main types of classification methods for power quality disturbances based on underlying causes: deterministic classification, giving an expert system as an example, and statistical classification, with support vector machines (a novel method as an example. An expert system is suitable when one has limited amount of data and sufficient power system expert knowledge; however, its application requires a set of threshold values. Statistical methods are suitable when large amount of data is available for training. Two important issues to guarantee the effectiveness of a classifier, data segmentation, and feature extraction are discussed. Segmentation of a sequence of data recording is preprocessing to partition the data into segments each representing a duration containing either an event or a transition between two events. Extraction of features is applied to each segment individually. Some useful features and their effectiveness are then discussed. Some experimental results are included for demonstrating the effectiveness of both systems. Finally, conclusions are given together with the discussion of some future research directions.
Szulc, Stefan
1965-01-01
Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.
Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine
2009-11-10
DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods
Directory of Open Access Journals (Sweden)
Leblois Raphael
2009-11-01
Full Text Available Abstract Background DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i phylogenetic methods (neighbour-joining and PhyML that attempt to account for the genealogical framework of DNA evolution and (ii supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods. These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. Results No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci - nuclear genes - improved the predictive performance of most methods. Conclusion The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
Directory of Open Access Journals (Sweden)
Korepanov Oleksiy S.
2017-12-01
Full Text Available The aim of the article is to study the labor market in Ukraine in the regional context using cluster analysis methods. The current state of the labor market in regions of Ukraine is analyzed, and a system of statistical indicators that influence the state and development of this market is formed. The expediency of using cluster analysis for grouping regions according to the level of development of the labor market is substantiated. The essence of cluster analysis is revealed, its main goal, key tasks, which can be solved by means of such analysis, are presented, basic stages of the analysis are considered. The main methods of clustering are described and, based on the results of the simulation, the advantages and disadvantages of each method are justified. In the work the clustering of regions of Ukraine by the level of labor market development using different methods of cluster analysis is carried out, conclusions on the results of the calculations performed are presented, and the main directions for further research are outlined.
Guo, Hongling; Yin, Baohua; Zhang, Jie; Quan, Yangke; Shi, Gaojun
2016-09-01
Counterfeiting of banknotes is a crime and seriously harmful to economy. Examination of the paper, ink and toners used to make counterfeit banknotes can provide useful information to classify and link different cases in which the suspects use the same raw materials. In this paper, 21 paper samples of counterfeit banknotes seized from 13 cases were analyzed by wavelength dispersive X-ray fluorescence. After measuring the elemental composition in paper semi-quantitatively, the normalized weight percentage data of 10 elements were processed by multivariate statistical methods of cluster analysis and principle component analysis. All these paper samples were mainly classified into 3 groups. Nine separate cases were successfully linked. It is demonstrated that elemental composition measured by XRF is a useful way to compare and classify papers used in different cases. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A statistical approach to root system classification.
Directory of Open Access Journals (Sweden)
Gernot eBodner
2013-08-01
Full Text Available Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for plant functional type identification in ecology can be applied to the classification of root systems. We demonstrate that combining principal component and cluster analysis yields a meaningful classification of rooting types based on morphological traits. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. Biplot inspection is used to determine key traits and to ensure stability in cluster based grouping. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Three rooting types emerged from measured data, distinguished by diameter/weight, density and spatial distribution respectively. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement
Texture classification by texton: statistical versus binary.
Directory of Open Access Journals (Sweden)
Zhenhua Guo
Full Text Available Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8, image patch (Statistical_Joint and locally invariant fractal (Statistical_Fractal are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor.
A Classification of Statistics Courses (A Framework for Studying Statistical Education)
Turner, J. C.
1976-01-01
A classification of statistics courses in presented, with main categories of "course type,""methods of presentation,""objectives," and "syllabus." Examples and suggestions for uses of the classification are given. (DT)
14 CFR Section 19 - Uniform Classification of Operating Statistics
2010-01-01
... Statistics Section 19 Section 19 Aeronautics and Space OFFICE OF THE SECRETARY, DEPARTMENT OF TRANSPORTATION... AIR CARRIERS Operating Statistics Classifications Section 19 Uniform Classification of Operating Statistics ...
Statistical Emulator for Expensive Classification Simulators
Ross, Jerret; Samareh, Jamshid A.
2016-01-01
Expensive simulators prevent any kind of meaningful analysis to be performed on the phenomena they model. To get around this problem the concept of using a statistical emulator as a surrogate representation of the simulator was introduced in the 1980's. Presently, simulators have become more and more complex and as a result running a single example on these simulators is very expensive and can take days to weeks or even months. Many new techniques have been introduced, termed criteria, which sequentially select the next best (most informative to the emulator) point that should be run on the simulator. These criteria methods allow for the creation of an emulator with only a small number of simulator runs. We follow and extend this framework to expensive classification simulators.
Classification, (big) data analysis and statistical learning
Conversano, Claudio; Vichi, Maurizio
2018-01-01
This edited book focuses on the latest developments in classification, statistical learning, data analysis and related areas of data science, including statistical analysis of large datasets, big data analytics, time series clustering, integration of data from different sources, as well as social networks. It covers both methodological aspects as well as applications to a wide range of areas such as economics, marketing, education, social sciences, medicine, environmental sciences and the pharmaceutical industry. In addition, it describes the basic features of the software behind the data analysis results, and provides links to the corresponding codes and data sets where necessary. This book is intended for researchers and practitioners who are interested in the latest developments and applications in the field. The peer-reviewed contributions were presented at the 10th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in Santa Margherita di Pul...
Sparse Classification - Methods & Applications
DEFF Research Database (Denmark)
Einarsson, Gudmundur
for analysing such data carry the potential to revolutionize tasks such as medical diagnostics where often decisions need to be based on only a few high-dimensional observations. This explosion in data dimensionality has sparked the development of novel statistical methods. In contrast, classical statistics...
Methods of statistical physics
Akhiezer, Aleksandr I
1981-01-01
Methods of Statistical Physics is an exposition of the tools of statistical mechanics, which evaluates the kinetic equations of classical and quantized systems. The book also analyzes the equations of macroscopic physics, such as the equations of hydrodynamics for normal and superfluid liquids and macroscopic electrodynamics. The text gives particular attention to the study of quantum systems. This study begins with a discussion of problems of quantum statistics with a detailed description of the basics of quantum mechanics along with the theory of measurement. An analysis of the asymptotic be
Statistical methods for forecasting
Abraham, Bovas
2009-01-01
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists."This book, it must be said, lives up to the words on its advertising cover: ''Bridging the gap between introductory, descriptive approaches and highly advanced theoretical treatises, it provides a practical, intermediate level discussion of a variety of forecasting tools, and explains how they relate to one another, both in theory and practice.'' It does just that!"-Journal of the Royal Statistical Society"A well-written work that deals with statistical methods and models that can be used to produce short-term forecasts, this book has wide-ranging applications. It could be used in the context of a study of regression, forecasting, and time series ...
Methods for data classification
Garrity, George [Okemos, MI; Lilburn, Timothy G [Front Royal, VA
2011-10-11
The present invention provides methods for classifying data and uncovering and correcting annotation errors. In particular, the present invention provides a self-organizing, self-correcting algorithm for use in classifying data. Additionally, the present invention provides a method for classifying biological taxa.
Understanding advanced statistical methods
Westfall, Peter
2013-01-01
Introduction: Probability, Statistics, and ScienceReality, Nature, Science, and ModelsStatistical Processes: Nature, Design and Measurement, and DataModelsDeterministic ModelsVariabilityParametersPurely Probabilistic Statistical ModelsStatistical Models with Both Deterministic and Probabilistic ComponentsStatistical InferenceGood and Bad ModelsUses of Probability ModelsRandom Variables and Their Probability DistributionsIntroductionTypes of Random Variables: Nominal, Ordinal, and ContinuousDiscrete Probability Distribution FunctionsContinuous Probability Distribution FunctionsSome Calculus-Derivatives and Least SquaresMore Calculus-Integrals and Cumulative Distribution FunctionsProbability Calculation and SimulationIntroductionAnalytic Calculations, Discrete and Continuous CasesSimulation-Based ApproximationGenerating Random NumbersIdentifying DistributionsIntroductionIdentifying Distributions from Theory AloneUsing Data: Estimating Distributions via the HistogramQuantiles: Theoretical and Data-Based Estimate...
Statistic methods for searching inundated radioactive entities
International Nuclear Information System (INIS)
Dubasov, Yu.V.; Krivokhatskij, A.S.; Khramov, N.N.
1993-01-01
The problem of searching flooded radioactive object in a present area was considered. Various models of the searching route plotting are discussed. It is shown that spiral route by random points from the centre of the area examined is the most efficient one. The conclusion is made that, when searching flooded radioactive objects, it is advisable to use multidimensional statistical methods of classification
Statistical methods and materials characterisation
International Nuclear Information System (INIS)
Wallin, K.R.W.
2010-01-01
Statistics is a wide mathematical area, which covers a myriad of analysis and estimation options, some of which suit special cases better than others. A comprehensive coverage of the whole area of statistics would be an enormous effort and would also be outside the capabilities of this author. Therefore, this does not intend to be a textbook on statistical methods available for general data analysis and decision making. Instead it will highlight a certain special statistical case applicable to mechanical materials characterization. The methods presented here do not in any way rule out other statistical methods by which to analyze mechanical property material data. (orig.)
Statistical fingerprinting for malware detection and classification
Prowell, Stacy J.; Rathgeb, Christopher T.
2015-09-15
A system detects malware in a computing architecture with an unknown pedigree. The system includes a first computing device having a known pedigree and operating free of malware. The first computing device executes a series of instrumented functions that, when executed, provide a statistical baseline that is representative of the time it takes the software application to run on a computing device having a known pedigree. A second computing device executes a second series of instrumented functions that, when executed, provides an actual time that is representative of the time the known software application runs on the second computing device. The system detects malware when there is a difference in execution times between the first and the second computing devices.
[Classification of local anesthesia methods].
Petricas, A Zh; Medvedev, D V; Olkhovskaya, E B
The traditional classification methods of dental local anesthesia must be modified. In this paper we proved that the vascular mechanism is leading component of spongy injection. It is necessary to take into account the high effectiveness and relative safety of spongy anesthesia, as well as versatility, ease of implementation and the growing prevalence in the world. The essence of the proposed modification is to distinguish the methods in diffusive (including surface anesthesia, infiltration and conductive anesthesia) and vascular-diffusive (including intraosseous, intraligamentary, intraseptal and intrapulpal anesthesia). For the last four methods the common term «spongy (intraosseous) anesthesia» may be used.
Statistical methods in quality assurance
International Nuclear Information System (INIS)
Eckhard, W.
1980-01-01
During the different phases of a production process - planning, development and design, manufacturing, assembling, etc. - most of the decision rests on a base of statistics, the collection, analysis and interpretation of data. Statistical methods can be thought of as a kit of tools to help to solve problems in the quality functions of the quality loop with respect to produce quality products and to reduce quality costs. Various statistical methods are represented, typical examples for their practical application are demonstrated. (RW)
Statistical methods for quality improvement
National Research Council Canada - National Science Library
Ryan, Thomas P
2011-01-01
...."-TechnometricsThis new edition continues to provide the most current, proven statistical methods for quality control and quality improvementThe use of quantitative methods offers numerous benefits...
International Nuclear Information System (INIS)
Bakraji, E.H.; Ahmad, M.; Salman, N.; Haloum, D.; Boutros, N.; Abboud, R.
2011-01-01
Thermoluminescence (TL) dating and Proton Induced X-ray Emission (PIXE) techniques have been utilized for the study of archaeological pottery fragment samples from Tell Saka Site, which is located at 25 km south east of Damascus city, Syria. Four samples were chosen randomly from the site, two from third level and two from fourth level for dating using TL technique and the results were in good agreement with the date assigned by archaeologists. Twenty-eight sherds were analyzed using PIXE technique in order to identify and characterize the elemental composition of pottery excavated from third and fourth levels, using 3 MV tandem accelerator in Damascus. The analysis provided almost 20 elements (Na, Mg, Al, Si, P, S, K, Ca, Ti, Mn, Fe, Co, Ni, Cu, Zn, Rb, Sr, Y, Zr, Nb). However, only 14 elements as follows: K, Ca, Ti, Mn, Fe, Co, Ni, Cu, Zn, Rb, Sr, Y, Zr, Nb were chosen for statistical analysis and have been processed using two multivariate statistical methods, Cluster and Factor analysis. The studied pottery were classify into two well defined groups. (author)
Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Statistical Methods in Integrative Genomics
Richardson, Sylvia; Tseng, George C.; Sun, Wei
2016-01-01
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531
Statistical methods in nonlinear dynamics
Indian Academy of Sciences (India)
Sensitivity to initial conditions in nonlinear dynamical systems leads to exponential divergence of trajectories that are initially arbitrarily close, and hence to unpredictability. Statistical methods have been found to be helpful in extracting useful information about such systems. In this paper, we review briefly some statistical ...
Statistical Methods in Psychology Journals.
Willkinson, Leland
1999-01-01
Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)
Statistical methods for physical science
Stanford, John L
1994-01-01
This volume of Methods of Experimental Physics provides an extensive introduction to probability and statistics in many areas of the physical sciences, with an emphasis on the emerging area of spatial statistics. The scope of topics covered is wide-ranging-the text discusses a variety of the most commonly used classical methods and addresses newer methods that are applicable or potentially important. The chapter authors motivate readers with their insightful discussions, augmenting their material withKey Features* Examines basic probability, including coverage of standard distributions, time s
Statistical methods in nuclear theory
International Nuclear Information System (INIS)
Shubin, Yu.N.
1974-01-01
The paper outlines statistical methods which are widely used for describing properties of excited states of nuclei and nuclear reactions. It discusses physical assumptions lying at the basis of known distributions between levels (Wigner, Poisson distributions) and of widths of highly excited states (Porter-Thomas distribution, as well as assumptions used in the statistical theory of nuclear reactions and in the fluctuation analysis. The author considers the random matrix method, which consists in replacing the matrix elements of a residual interaction by random variables with a simple statistical distribution. Experimental data are compared with results of calculations using the statistical model. The superfluid nucleus model is considered with regard to superconducting-type pair correlations
Robust statistical methods with R
Jureckova, Jana
2005-01-01
Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...
Statistical Methods for Fuzzy Data
Viertl, Reinhard
2011-01-01
Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m
Statistical methods in spatial genetics
DEFF Research Database (Denmark)
Guillot, Gilles; Leblois, Raphael; Coulon, Aurelie
2009-01-01
The joint analysis of spatial and genetic data is rapidly becoming the norm in population genetics. More and more studies explicitly describe and quantify the spatial organization of genetic variation and try to relate it to underlying ecological processes. As it has become increasingly difficult...... to keep abreast with the latest methodological developments, we review the statistical toolbox available to analyse population genetic data in a spatially explicit framework. We mostly focus on statistical concepts but also discuss practical aspects of the analytical methods, highlighting not only...
Nonequilibrium statistical mechanics ensemble method
Eu, Byung Chan
1998-01-01
In this monograph, nonequilibrium statistical mechanics is developed by means of ensemble methods on the basis of the Boltzmann equation, the generic Boltzmann equations for classical and quantum dilute gases, and a generalised Boltzmann equation for dense simple fluids The theories are developed in forms parallel with the equilibrium Gibbs ensemble theory in a way fully consistent with the laws of thermodynamics The generalised hydrodynamics equations are the integral part of the theory and describe the evolution of macroscopic processes in accordance with the laws of thermodynamics of systems far removed from equilibrium Audience This book will be of interest to researchers in the fields of statistical mechanics, condensed matter physics, gas dynamics, fluid dynamics, rheology, irreversible thermodynamics and nonequilibrium phenomena
Statistical Fractal Models Based on GND-PCA and Its Application on Classification of Liver Diseases
Directory of Open Access Journals (Sweden)
Huiyan Jiang
2013-01-01
Full Text Available A new method is proposed to establish the statistical fractal model for liver diseases classification. Firstly, the fractal theory is used to construct the high-order tensor, and then Generalized -dimensional Principal Component Analysis (GND-PCA is used to establish the statistical fractal model and select the feature from the region of liver; at the same time different features have different weights, and finally, Support Vector Machine Optimized Ant Colony (ACO-SVM algorithm is used to establish the classifier for the recognition of liver disease. In order to verify the effectiveness of the proposed method, PCA eigenface method and normal SVM method are chosen as the contrast methods. The experimental results show that the proposed method can reconstruct liver volume better and improve the classification accuracy of liver diseases.
Statistical methods for quality assurance
International Nuclear Information System (INIS)
Rinne, H.; Mittag, H.J.
1989-01-01
This is the first German-language textbook on quality assurance and the fundamental statistical methods that is suitable for private study. The material for this book has been developed from a course of Hagen Open University and is characterized by a particularly careful didactical design which is achieved and supported by numerous illustrations and photographs, more than 100 exercises with complete problem solutions, many fully displayed calculation examples, surveys fostering a comprehensive approach, bibliography with comments. The textbook has an eye to practice and applications, and great care has been taken by the authors to avoid abstraction wherever appropriate, to explain the proper conditions of application of the testing methods described, and to give guidance for suitable interpretation of results. The testing methods explained also include latest developments and research results in order to foster their adoption in practice. (orig.) [de
Order statistics & inference estimation methods
Balakrishnan, N
1991-01-01
The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co
Vortex methods and vortex statistics
International Nuclear Information System (INIS)
Chorin, A.J.
1993-05-01
Vortex methods originated from the observation that in incompressible, inviscid, isentropic flow vorticity (or, more accurately, circulation) is a conserved quantity, as can be readily deduced from the absence of tangential stresses. Thus if the vorticity is known at time t = 0, one can deduce the flow at a later time by simply following it around. In this narrow context, a vortex method is a numerical method that makes use of this observation. Even more generally, the analysis of vortex methods leads, to problems that are closely related to problems in quantum physics and field theory, as well as in harmonic analysis. A broad enough definition of vortex methods ends up by encompassing much of science. Even the purely computational aspects of vortex methods encompass a range of ideas for which vorticity may not be the best unifying theme. The author restricts himself in these lectures to a special class of numerical vortex methods, those that are based on a Lagrangian transport of vorticity in hydrodynamics by smoothed particles (''blobs'') and those whose understanding contributes to the understanding of blob methods. Vortex methods for inviscid flow lead to systems of ordinary differential equations that can be readily clothed in Hamiltonian form, both in three and two space dimensions, and they can preserve exactly a number of invariants of the Euler equations, including topological invariants. Their viscous versions resemble Langevin equations. As a result, they provide a very useful cartoon of statistical hydrodynamics, i.e., of turbulence, one that can to some extent be analyzed analytically and more importantly, explored numerically, with important implications also for superfluids, superconductors, and even polymers. In the authors view, vortex ''blob'' methods provide the most promising path to the understanding of these phenomena
Bayes linear statistics, theory & methods
Goldstein, Michael
2007-01-01
Bayesian methods combine information available from data with any prior information available from expert knowledge. The Bayes linear approach follows this path, offering a quantitative structure for expressing beliefs, and systematic methods for adjusting these beliefs, given observational data. The methodology differs from the full Bayesian methodology in that it establishes simpler approaches to belief specification and analysis based around expectation judgements. Bayes Linear Statistics presents an authoritative account of this approach, explaining the foundations, theory, methodology, and practicalities of this important field. The text provides a thorough coverage of Bayes linear analysis, from the development of the basic language to the collection of algebraic results needed for efficient implementation, with detailed practical examples. The book covers:The importance of partial prior specifications for complex problems where it is difficult to supply a meaningful full prior probability specification...
ACCUWIND - Methods for classification of cup anemometers
Energy Technology Data Exchange (ETDEWEB)
Dahlberg, J.Aa.; Friis Pedersen, T.; Busche, P.
2006-05-15
Errors associated with the measurement of wind speed are the major sources of uncertainties in power performance testing of wind turbines. Field comparisons of well-calibrated anemometers show significant and not acceptable difference. The European CLASSCUP project posed the objectives to quantify the errors associated with the use of cup anemometers, and to develop a classification system for quantification of systematic errors of cup anemometers. This classification system has now been implemented in the IEC 61400-12-1 standard on power performance measurements in annex I and J. The classification of cup anemometers requires general external climatic operational ranges to be applied for the analysis of systematic errors. A Class A category classification is connected to reasonably flat sites, and another Class B category is connected to complex terrain, General classification indices are the result of assessment of systematic deviations. The present report focuses on methods that can be applied for assessment of such systematic deviations. A new alternative method for torque coefficient measurements at inclined flow have been developed, which have then been applied and compared to the existing methods developed in the CLASSCUP project and earlier. A number of approaches including the use of two cup anemometer models, two methods of torque coefficient measurement, two angular response measurements, and inclusion and exclusion of influence of friction have been implemented in the classification process in order to assess the robustness of methods. The results of the analysis are presented as classification indices, which are compared and discussed. (au)
THE GROWTH POINTS OF STATISTICAL METHODS
Orlov A. I.
2014-01-01
On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data
2013-02-07
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the..., Medical Systems Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo...
Classification of Malaysia aromatic rice using multivariate statistical analysis
International Nuclear Information System (INIS)
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-01-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties
Classification of Malaysia aromatic rice using multivariate statistical analysis
Energy Technology Data Exchange (ETDEWEB)
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A. [School of Mechatronic Engineering, Universiti Malaysia Perlis, Kampus Pauh Putra, 02600 Arau, Perlis (Malaysia); Omar, O. [Malaysian Agriculture Research and Development Institute (MARDI), Persiaran MARDI-UPM, 43400 Serdang, Selangor (Malaysia)
2015-05-15
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Classification of Malaysia aromatic rice using multivariate statistical analysis
Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.
2015-05-01
Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
New Statistics for Texture Classification Based on Gabor Filters
Directory of Open Access Journals (Sweden)
J. Pavlovicova
2007-09-01
Full Text Available The paper introduces a new method of texture segmentation efficiency evaluation. One of the well known texture segmentation methods is based on Gabor filters because of their orientation and spatial frequency character. Several statistics are used to extract more information from results obtained by Gabor filtering. Big amount of input parameters causes a wide set of results which need to be evaluated. The evaluation method is based on the normal distributions Gaussian curves intersection assessment and provides a new point of view to the segmentation method selection.
ACCUWIND - Methods for classification of cup anemometers
DEFF Research Database (Denmark)
Dahlberg, J.-Å.; Friis Pedersen, Troels; Busche, P.
2006-01-01
the errors associated with the use of cup anemometers, and to develop a classification system for quantification of systematic errors of cup anemometers. This classification system has now been implementedin the IEC 61400-12-1 standard on power performance measurements in annex I and J. The classification...... of cup anemometers requires general external climatic operational ranges to be applied for the analysis of systematic errors. A Class A categoryclassification is connected to reasonably flat sites, and another Class B category is connected to complex terrain, General classification indices are the result...... developed in the CLASSCUP projectand earlier. A number of approaches including the use of two cup anemometer models, two methods of torque coefficient measurement, two angular response measurements, and inclusion and exclusion of influence of friction have been implemented in theclassification process...
Statistical methods in radiation physics
Turner, James E; Bogard, James S
2012-01-01
This statistics textbook, with particular emphasis on radiation protection and dosimetry, deals with statistical solutions to problems inherent in health physics measurements and decision making. The authors begin with a description of our current understanding of the statistical nature of physical processes at the atomic level, including radioactive decay and interactions of radiation with matter. Examples are taken from problems encountered in health physics, and the material is presented such that health physicists and most other nuclear professionals will more readily understand the application of statistical principles in the familiar context of the examples. Problems are presented at the end of each chapter, with solutions to selected problems provided online. In addition, numerous worked examples are included throughout the text.
Statistical inference via fiducial methods
Salomé, Diemer
1998-01-01
In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary
Application of Cocktail method in vegetation classification
Directory of Open Access Journals (Sweden)
Hamed Asadi
2016-09-01
Full Text Available This study intends to assess the application of Cocktail method in the classification of large vegetation databases. For this purpose, Buxus hyrcana dataset consisted of 442 relevés with 89 species were used and by the modified TWINSPAN. For running the Cocktail method, first primarily classification was done by modified TWINSPAN, and by performing phi analysis in the groups resulted five species were selected which had the highest fidelity value. Then sociological species groups were formed by examining co-occurrence of these 5 species with other species in the database. 21 plant communities belongs to 6 variant, 17 sub associations, 11 associations, 4 alliance, 1 order and 1 class were recognized by assigning 379 releves to the sociological species groups by using logical formulas. Also, 63 releves by the logical formula were not assigned to any sociological species groups, by FPFI index were assigned to the sociological species groups which had the most index value. According to 91% classification agreement with Brown-Blanquet classification and Cocktail classification, we suggest Cocktail method to vegetation scientists as an efficient alternative of Braun-Blanquet method to classify large vegetation databases.
Global Optimization Ensemble Model for Classification Methods
Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab
2014-01-01
Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382
Global Optimization Ensemble Model for Classification Methods
Directory of Open Access Journals (Sweden)
Hina Anwar
2014-01-01
Full Text Available Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity.
Register-based statistics statistical methods for administrative data
Wallgren, Anders
2014-01-01
This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi
Binary Classification Method of Social Network Users
Directory of Open Access Journals (Sweden)
I. A. Poryadin
2017-01-01
Full Text Available The subject of research is a binary classification method of social network users based on the data analysis they have placed. Relevance of the task to gain information about a person by examining the content of his/her pages in social networks is exemplified. The most common approach to its solution is a visual browsing. The order of the regional authority in our country illustrates that its using in school education is needed. The article shows restrictions on the visual browsing of pupil’s pages in social networks as a tool for the teacher and the school psychologist and justifies that a process of social network users’ data analysis should be automated. Explores publications, which describe such data acquisition, processing, and analysis methods and considers their advantages and disadvantages. The article also gives arguments to support a proposal to study the classification method of social network users. One such method is credit scoring, which is used in banks and credit institutions to assess the solvency of clients. Based on the high efficiency of the method there is a proposal for significant expansion of its using in other areas of society. The possibility to use logistic regression as the mathematical apparatus of the proposed method of binary classification has been justified. Such an approach enables taking into account the different types of data extracted from social networks. Among them: the personal user data, information about hobbies, friends, graphic and text information, behaviour characteristics. The article describes a number of existing methods of data transformation that can be applied to solve the problem. An experiment of binary gender-based classification of social network users is described. A logistic model obtained for this example includes multiple logical variables obtained by transforming the user surnames. This experiment confirms the feasibility of the proposed method. Further work is to define a system
Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.
2017-09-01
In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.
Permutation statistical methods an integrated approach
Berry, Kenneth J; Johnston, Janis E
2016-01-01
This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...
Generalized t-statistic for two-group classification.
Komori, Osamu; Eguchi, Shinto; Copas, John B
2015-06-01
In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.
Web Page Classification Method Using Neural Networks
Selamat, Ali; Omatu, Sigeru; Yanagimoto, Hidekazu; Fujinaka, Toru; Yoshioka, Michifumi
Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is combined with the feature vectors from the class-profile which contains the most regular words in each class before feeding them to the neural networks. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together with the reduced principal components from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM method provides acceptable classification accuracy with the sports news datasets.
Statistical methods in physical mapping
International Nuclear Information System (INIS)
Nelson, D.O.
1995-05-01
One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work
Statistical methods in physical mapping
Energy Technology Data Exchange (ETDEWEB)
Nelson, David O. [Univ. of California, Berkeley, CA (United States)
1995-05-01
One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work.
Statistical learning methods: Basics, control and performance
Energy Technology Data Exchange (ETDEWEB)
Zimmermann, J. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de
2006-04-01
The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms.
Statistical learning methods: Basics, control and performance
International Nuclear Information System (INIS)
Zimmermann, J.
2006-01-01
The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms
Statistics of Monte Carlo methods used in radiation transport calculation
International Nuclear Information System (INIS)
Datta, D.
2009-01-01
Radiation transport calculation can be carried out by using either deterministic or statistical methods. Radiation transport calculation based on statistical methods is basic theme of the Monte Carlo methods. The aim of this lecture is to describe the fundamental statistics required to build the foundations of Monte Carlo technique for radiation transport calculation. Lecture note is organized in the following way. Section (1) will describe the introduction of Basic Monte Carlo and its classification towards the respective field. Section (2) will describe the random sampling methods, a key component of Monte Carlo radiation transport calculation, Section (3) will provide the statistical uncertainty of Monte Carlo estimates, Section (4) will describe in brief the importance of variance reduction techniques while sampling particles such as photon, or neutron in the process of radiation transport
Multivariate statistical methods a primer
Manly, Bryan FJ
2004-01-01
THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o
Statistical data analysis using SAS intermediate statistical methods
Marasinghe, Mervyn G
2018-01-01
The aim of this textbook (previously titled SAS for Data Analytics) is to teach the use of SAS for statistical analysis of data for advanced undergraduate and graduate students in statistics, data science, and disciplines involving analyzing data. The book begins with an introduction beyond the basics of SAS, illustrated with non-trivial, real-world, worked examples. It proceeds to SAS programming and applications, SAS graphics, statistical analysis of regression models, analysis of variance models, analysis of variance with random and mixed effects models, and then takes the discussion beyond regression and analysis of variance to conclude. Pedagogically, the authors introduce theory and methodological basis topic by topic, present a problem as an application, followed by a SAS analysis of the data provided and a discussion of results. The text focuses on applied statistical problems and methods. Key features include: end of chapter exercises, downloadable SAS code and data sets, and advanced material suitab...
Statistical methods for nuclear material management
Energy Technology Data Exchange (ETDEWEB)
Bowen W.M.; Bennett, C.A. (eds.)
1988-12-01
This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems.
Statistical methods for nuclear material management
International Nuclear Information System (INIS)
Bowen, W.M.; Bennett, C.A.
1988-12-01
This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems
Hoell, Simon; Omenzetter, Piotr
2017-04-01
The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.
Statistical classification techniques in high energy physics (SDDT algorithm)
International Nuclear Information System (INIS)
Bouř, Petr; Kůs, Václav; Franc, Jiří
2016-01-01
We present our proposal of the supervised binary divergence decision tree with nested separation method based on the generalized linear models. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different classes. Further, we apply our method to Monte Carlo simulations of physics processes corresponding to a data sample of top quark-antiquark pair candidate events in the lepton+jets decay channel. The data sample is produced in pp̅ collisions at √S = 1.96 TeV. It corresponds to an integrated luminosity of 9.7 fb"-"1 recorded with the D0 detector during Run II of the Fermilab Tevatron Collider. The efficiency of our algorithm achieves 90% AUC in separating signal from background. We also briefly deal with the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulations and measured data. The justification of these modified tests is proposed through the divergence tests. (paper)
Empirical evaluation of data normalization methods for molecular classification.
Huang, Huei-Chung; Qin, Li-Xuan
2018-01-01
Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine. In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Multivariate statistical methods a first course
Marcoulides, George A
2014-01-01
Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
A classification scheme for risk assessment methods.
Energy Technology Data Exchange (ETDEWEB)
Stamp, Jason Edwin; Campbell, Philip LaRoche
2004-08-01
This report presents a classification scheme for risk assessment methods. This scheme, like all classification schemes, provides meaning by imposing a structure that identifies relationships. Our scheme is based on two orthogonal aspects--level of detail, and approach. The resulting structure is shown in Table 1 and is explained in the body of the report. Each cell in the Table represent a different arrangement of strengths and weaknesses. Those arrangements shift gradually as one moves through the table, each cell optimal for a particular situation. The intention of this report is to enable informed use of the methods so that a method chosen is optimal for a situation given. This report imposes structure on the set of risk assessment methods in order to reveal their relationships and thus optimize their usage.We present a two-dimensional structure in the form of a matrix, using three abstraction levels for the rows and three approaches for the columns. For each of the nine cells in the matrix we identify the method type by name and example. The matrix helps the user understand: (1) what to expect from a given method, (2) how it relates to other methods, and (3) how best to use it. Each cell in the matrix represent a different arrangement of strengths and weaknesses. Those arrangements shift gradually as one moves through the table, each cell optimal for a particular situation. The intention of this report is to enable informed use of the methods so that a method chosen is optimal for a situation given. The matrix, with type names in the cells, is introduced in Table 2 on page 13 below. Unless otherwise stated we use the word 'method' in this report to refer to a 'risk assessment method', though often times we use the full phrase. The use of the terms 'risk assessment' and 'risk management' are close enough that we do not attempt to distinguish them in this report. The remainder of this report is organized as follows. In
Sreejith, Sreevarsha; Pereverzyev, Sergiy, Jr.; Kelvin, Lee S.; Marleau, Francine R.; Haltmeier, Markus; Ebner, Judith; Bland-Hawthorn, Joss; Driver, Simon P.; Graham, Alister W.; Holwerda, Benne W.; Hopkins, Andrew M.; Liske, Jochen; Loveday, Jon; Moffett, Amanda J.; Pimbblet, Kevin A.; Taylor, Edward N.; Wang, Lingyu; Wright, Angus H.
2018-03-01
We apply four statistical learning methods to a sample of 7941 galaxies (z test the feasibility of using automated algorithms to classify galaxies. Using 10 features measured for each galaxy (sizes, colours, shape parameters, and stellar mass), we apply the techniques of Support Vector Machines, Classification Trees, Classification Trees with Random Forest (CTRF) and Neural Networks, and returning True Prediction Ratios (TPRs) of 75.8 per cent, 69.0 per cent, 76.2 per cent, and 76.0 per cent, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification (`unanimous disagreement') serves as a potential indicator of human error in classification, occurring in ˜ 9 per cent of ellipticals, ˜ 9 per cent of little blue spheroids, ˜ 14 per cent of early-type spirals, ˜ 21 per cent of intermediate-type spirals, and ˜ 4 per cent of late-type spirals and irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy data sets. Adopting the CTRF algorithm, the TPRs of the five galaxy types are : E, 70.1 per cent; LBS, 75.6 per cent; S0-Sa, 63.6 per cent; Sab-Scd, 56.4 per cent, and Sd-Irr, 88.9 per cent. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS, and S0-Sa) and disc-dominated (Sab-Scd and Sd-Irr), achieving an overall accuracy of 89.8 per cent. This translates into an accuracy of 84.9 per cent for spheroid-dominated systems and 92.5 per cent for disc-dominated systems.
Advanced statistical methods in data science
Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao
2016-01-01
This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...
2010-07-08
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Prevention, Classifications and Public Health Data Standards, 3311 Toledo Road, Room 2337, Hyattsville, MD...
2013-08-28
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337...
Statistical Methods for Environmental Pollution Monitoring
Energy Technology Data Exchange (ETDEWEB)
Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
1987-01-01
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.
Statistical methods and challenges in connectome genetics
Pluta, Dustin; Yu, Zhaoxia; Shen, Tong; Chen, Chuansheng; Xue, Gui; Ombao, Hernando
2018-01-01
The study of genetic influences on brain connectivity, known as connectome genetics, is an exciting new direction of research in imaging genetics. We here review recent results and current statistical methods in this area, and discuss some
Methods and statistics for combining motif match scores.
Bailey, T L; Gribskov, M
1998-01-01
Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
Statistical methods in personality assessment research.
Schinka, J A; LaLone, L; Broeckel, J A
1997-06-01
Emerging models of personality structure and advances in the measurement of personality and psychopathology suggest that research in personality and personality assessment has entered a stage of advanced development, in this article we examine whether researchers in these areas have taken advantage of new and evolving statistical procedures. We conducted a review of articles published in the Journal of Personality, Assessment during the past 5 years. Of the 449 articles that included some form of data analysis, 12.7% used only descriptive statistics, most employed only univariate statistics, and fewer than 10% used multivariate methods of data analysis. We discuss the cost of using limited statistical methods, the possible reasons for the apparent reluctance to employ advanced statistical procedures, and potential solutions to this technical shortcoming.
Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.
1980-01-01
Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.
A novel statistical method for classifying habitat generalists and specialists
DEFF Research Database (Denmark)
Chazdon, Robin L; Chao, Anne; Colwell, Robert K
2011-01-01
in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...
Spatial analysis statistics, visualization, and computational methods
Oyana, Tonny J
2015-01-01
An introductory text for the next generation of geospatial analysts and data scientists, Spatial Analysis: Statistics, Visualization, and Computational Methods focuses on the fundamentals of spatial analysis using traditional, contemporary, and computational methods. Outlining both non-spatial and spatial statistical concepts, the authors present practical applications of geospatial data tools, techniques, and strategies in geographic studies. They offer a problem-based learning (PBL) approach to spatial analysis-containing hands-on problem-sets that can be worked out in MS Excel or ArcGIS-as well as detailed illustrations and numerous case studies. The book enables readers to: Identify types and characterize non-spatial and spatial data Demonstrate their competence to explore, visualize, summarize, analyze, optimize, and clearly present statistical data and results Construct testable hypotheses that require inferential statistical analysis Process spatial data, extract explanatory variables, conduct statisti...
Workshop on Analytical Methods in Statistics
Jurečková, Jana; Maciak, Matúš; Pešta, Michal
2017-01-01
This volume collects authoritative contributions on analytical methods and mathematical statistics. The methods presented include resampling techniques; the minimization of divergence; estimation theory and regression, eventually under shape or other constraints or long memory; and iterative approximations when the optimal solution is difficult to achieve. It also investigates probability distributions with respect to their stability, heavy-tailness, Fisher information and other aspects, both asymptotically and non-asymptotically. The book not only presents the latest mathematical and statistical methods and their extensions, but also offers solutions to real-world problems including option pricing. The selected, peer-reviewed contributions were originally presented at the workshop on Analytical Methods in Statistics, AMISTAT 2015, held in Prague, Czech Republic, November 10-13, 2015.
Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa
2018-03-01
In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
Goyal, Shrigopal; Balhara, Yatan Pal Singh; Khandelwal, S K
2012-07-01
Two of the most commonly used nosological systems- International Statistical Classification of Diseases and Related Health Problems (ICD)-10 and Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV are under revision. This process has generated a lot of interesting debates with regards to future of the current diagnostic categories. In fact, the status of categorical approach in the upcoming versions of ICD and DSM is also being debated. The current article focuses on the debate with regards to the eating disorders. The existing classification of eating disorders has been criticized for its limitations. A host of new diagnostic categories have been recommended for inclusion in the upcoming revisions. Also the structure of the existing categories has also been put under scrutiny.
Toward optimal feature selection using ranking methods and classification algorithms
Directory of Open Access Journals (Sweden)
Novaković Jasmina
2011-01-01
Full Text Available We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.
Discriminant forest classification method and system
Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.
2012-11-06
A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.
CCM: A Text Classification Method by Clustering
DEFF Research Database (Denmark)
Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock
2011-01-01
In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...
The ability of current statistical classifications to separate services and manufacturing
DEFF Research Database (Denmark)
Christensen, Jesper Lindgaard
2013-01-01
This paper explores the performance of current statistical classification systems in classifying firms and, in particular, their ability to distinguish between firms that provide services and firms that provide manufacturing. We find that a large share of firms, almost 20%, are not classified...... as expected based on a comparison of their statements of activities with the assigned industry codes. This result is robust to analyses on different levels of aggregation and is validated in an additional survey. It is well known from earlier literature that industry classification systems are not perfect....... This paper provides a quantification of the flaws in classifications of firms. Moreover, it is explained why the classifications of firms are imprecise. The increasing complexity of production, inertia in changes to statistical systems and the increasing integration of manufacturing products and services...
Statistical Methods for Stochastic Differential Equations
Kessler, Mathieu; Sorensen, Michael
2012-01-01
The seventh volume in the SemStat series, Statistical Methods for Stochastic Differential Equations presents current research trends and recent developments in statistical methods for stochastic differential equations. Written to be accessible to both new students and seasoned researchers, each self-contained chapter starts with introductions to the topic at hand and builds gradually towards discussing recent research. The book covers Wiener-driven equations as well as stochastic differential equations with jumps, including continuous-time ARMA processes and COGARCH processes. It presents a sp
Statistical methods for spatio-temporal systems
Finkenstadt, Barbel
2006-01-01
Statistical Methods for Spatio-Temporal Systems presents current statistical research issues on spatio-temporal data modeling and will promote advances in research and a greater understanding between the mechanistic and the statistical modeling communities.Contributed by leading researchers in the field, each self-contained chapter starts with an introduction of the topic and progresses to recent research results. Presenting specific examples of epidemic data of bovine tuberculosis, gastroenteric disease, and the U.K. foot-and-mouth outbreak, the first chapter uses stochastic models, such as point process models, to provide the probabilistic backbone that facilitates statistical inference from data. The next chapter discusses the critical issue of modeling random growth objects in diverse biological systems, such as bacteria colonies, tumors, and plant populations. The subsequent chapter examines data transformation tools using examples from ecology and air quality data, followed by a chapter on space-time co...
Statistical control chart and neural network classification for improving human fall detection
Harrou, Fouzi; Zerrouki, Nabil; Sun, Ying; Houacine, Amrane
2017-01-01
This paper proposes a statistical approach to detect and classify human falls based on both visual data from camera and accelerometric data captured by accelerometer. Specifically, we first use a Shewhart control chart to detect the presence of potential falls by using accelerometric data. Unfortunately, this chart cannot distinguish real falls from fall-like actions, such as lying down. To bypass this difficulty, a neural network classifier is then applied only on the detected cases through visual data. To assess the performance of the proposed method, experiments are conducted on the publicly available fall detection databases: the University of Rzeszow's fall detection (URFD) dataset. Results demonstrate that the detection phase play a key role in reducing the number of sequences used as input into the neural network classifier for classification, significantly reducing computational burden and achieving better accuracy.
Statistical control chart and neural network classification for improving human fall detection
Harrou, Fouzi
2017-01-05
This paper proposes a statistical approach to detect and classify human falls based on both visual data from camera and accelerometric data captured by accelerometer. Specifically, we first use a Shewhart control chart to detect the presence of potential falls by using accelerometric data. Unfortunately, this chart cannot distinguish real falls from fall-like actions, such as lying down. To bypass this difficulty, a neural network classifier is then applied only on the detected cases through visual data. To assess the performance of the proposed method, experiments are conducted on the publicly available fall detection databases: the University of Rzeszow\\'s fall detection (URFD) dataset. Results demonstrate that the detection phase play a key role in reducing the number of sequences used as input into the neural network classifier for classification, significantly reducing computational burden and achieving better accuracy.
Classification of lung sounds using higher-order statistics: A divide-and-conquer approach.
Naves, Raphael; Barbosa, Bruno H G; Ferreira, Danton D
2016-06-01
Lung sound auscultation is one of the most commonly used methods to evaluate respiratory diseases. However, the effectiveness of this method depends on the physician's training. If the physician does not have the proper training, he/she will be unable to distinguish between normal and abnormal sounds generated by the human body. Thus, the aim of this study was to implement a pattern recognition system to classify lung sounds. We used a dataset composed of five types of lung sounds: normal, coarse crackle, fine crackle, monophonic and polyphonic wheezes. We used higher-order statistics (HOS) to extract features (second-, third- and fourth-order cumulants), Genetic Algorithms (GA) and Fisher's Discriminant Ratio (FDR) to reduce dimensionality, and k-Nearest Neighbors and Naive Bayes classifiers to recognize the lung sound events in a tree-based system. We used the cross-validation procedure to analyze the classifiers performance and the Tukey's Honestly Significant Difference criterion to compare the results. Our results showed that the Genetic Algorithms outperformed the Fisher's Discriminant Ratio for feature selection. Moreover, each lung class had a different signature pattern according to their cumulants showing that HOS is a promising feature extraction tool for lung sounds. Besides, the proposed divide-and-conquer approach can accurately classify different types of lung sounds. The classification accuracy obtained by the best tree-based classifier was 98.1% for classification accuracy on training, and 94.6% for validation data. The proposed approach achieved good results even using only one feature extraction tool (higher-order statistics). Additionally, the implementation of the proposed classifier in an embedded system is feasible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Statistical methods and challenges in connectome genetics
Pluta, Dustin
2018-03-12
The study of genetic influences on brain connectivity, known as connectome genetics, is an exciting new direction of research in imaging genetics. We here review recent results and current statistical methods in this area, and discuss some of the persistent challenges and possible directions for future work.
Application of Turchin's method of statistical regularization
Zelenyi, Mikhail; Poliakova, Mariia; Nozik, Alexander; Khudyakov, Alexey
2018-04-01
During analysis of experimental data, one usually needs to restore a signal after it has been convoluted with some kind of apparatus function. According to Hadamard's definition this problem is ill-posed and requires regularization to provide sensible results. In this article we describe an implementation of the Turchin's method of statistical regularization based on the Bayesian approach to the regularization strategy.
The research on business rules classification and specification methods
Baltrušaitis, Egidijus
2005-01-01
The work is based on the research of business rules classification and specification methods. The basics of business rules approach are discussed. The most common business rules classification and modeling methods are analyzed. Business rules modeling techniques and tools for supporting them in the information systems are presented. Basing on the analysis results business rules classification method is proposed. Templates for every business rule type are presented. Business rules structuring ...
Statistical Methods for Unusual Count Data
DEFF Research Database (Denmark)
Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads
2016-01-01
microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...... microchimerism values, applied to simulated data sets and 2 observed data sets, to make recommendations for analytic practice. Modeling the level of quantitative microchimerism as a rate via Poisson or negative binomial model with the rate of detection defined as a count of microchimerism genome equivalents per...
Mercier, Lény; Darnaude, Audrey M; Bruguier, Olivier; Vasconcelos, Rita P; Cabral, Henrique N; Costa, Maria J; Lara, Monica; Jones, David L; Mouillot, David
2011-06-01
Reliable assessment of fish origin is of critical importance for exploited species, since nursery areas must be identified and protected to maintain recruitment to the adult stock. During the last two decades, otolith chemical signatures (or "fingerprints") have been increasingly used as tools to discriminate between coastal habitats. However, correct assessment of fish origin from otolith fingerprints depends on various environmental and methodological parameters, including the choice of the statistical method used to assign fish to unknown origin. Among the available methods of classification, Linear Discriminant Analysis (LDA) is the most frequently used, although it assumes data are multivariate normal with homogeneous within-group dispersions, conditions that are not always met by otolith chemical data, even after transformation. Other less constrained classification methods are available, but there is a current lack of comparative analysis in applications to otolith microchemistry. Here, we assessed stock identification accuracy for four classification methods (LDA, Quadratic Discriminant Analysis [QDA], Random Forests [RF], and Artificial Neural Networks [ANN]), through the use of three distinct data sets. In each case, all possible combinations of chemical elements were examined to identify the elements to be used for optimal accuracy in fish assignment to their actual origin. Our study shows that accuracy differs according to the model and the number of elements considered. Best combinations did not include all the elements measured, and it was not possible to define an ad hoc multielement combination for accurate site discrimination. Among all the models tested, RF and ANN performed best, especially for complex data sets (e.g., with numerous fish species and/or chemical elements involved). However, for these data, RF was less time-consuming and more interpretable than ANN, and far more efficient and less demanding in terms of assumptions than LDA or QDA
BASIC METHODS OF CLASSIFICATION AND CHARACTERISTICS OF METHODS OF PRICING IN UKRAINE
Directory of Open Access Journals (Sweden)
A. Boguslavskiy
2014-12-01
Full Text Available The article provided definitions and shows the need to use different methods of pricing of enterprises. Exposed the reasons of the absence of a universal classification of pricing methods. The approaches of different authors to classify groups of pricing methods: 1 the cost method; 2 methods with a focus on competition; 3 methods for pricing based on demand, 4 pricing with a focus on maximum profit, 5 parametric methods, 6 pricing under risk and uncertainty, etc. An improved classification pricing methods with the release of the following groups: 1 the methods of cost pricing; 2 methods based on demand; 3 methods, based on competition; 4 microeconomic methods; 5 methods which are based on product life cycles; 6 methods, depending on economic conditions; 7 econometric and statistical techniques 8 Methods of transfer pricing; 9 methods in accordance with the terms of agreements; 10 Methods of assortment pricing; 11 combined methods of pricing and so on. The basic directions of use of combined methods of pricing and analysis of their possible use in Ukraine are shown.
Image Classification Workflow Using Machine Learning Methods
Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.
2016-12-01
Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.
Methods for size classification of wood chips
Energy Technology Data Exchange (ETDEWEB)
Hartmann, Hans; Boehm, Thorsten [Technologie- und Foerderzentrum im Kompetenzzentrum fuer Nachwachsende Rohstoffe (TFZ), Schulgasse 18, D-94315 Straubing (Germany); Daugbjerg Jensen, Peter [Forest and Landscape FLD, The Royal Veterinary and Agricultural University, Rolighedsvej 23, DK-1958 Frederiksberg C (Denmark); Temmerman, Michaeel; Rabier, Fabienne [Centre wallon de Recherches agronomiques CRA-W Departement Genie rural, 146, Chaussee de Namur, B-5030 Gembloux (Belgium); Golser, Michael [Holzforschung Austria HFA Franz Grill-Stra beta e 7, A-1031 Wien (Austria)
2006-11-15
Methods for size classification of wood chips were analysed in an international round robin using 13 conventional wood chip samples and two specially prepared standard samples, one from wood chips and one from hog fuel. The true size distribution of these two samples (according to length, width and height) had been determined stereometrically (reference method) using a digital calliper gauge and by weighing each of the about 7000 wood particles per sample. Five different horizontal and three rotary screening devices were tested using five different screen hole diameters (3.15, 8, 16, 45, 63mm, round holes). These systems are compared to a commercially available continuously measuring image analysis equipment. The results show that among the devices of a measuring principle-horizontal and rotary screening-the results are quite comparable, while there is a severe incompatibility when distributions are determined by different measuring principles. Highest conformity with the reference values is given for measurements with an image analysis system, whereas for all machines with horizontal screens the median value of the size distribution only reached between one-third to half of the reference median value for the particle length distribution. These deviations can be attributed to a higher particle misplacement, which is particularly found in the larger fractions. Such differences decrease when the particle's shape is more roundish (i.e. sphericity closer to one). The median values of length distributions from screenings with a rotary classifier are between the measurements from an image analysis and horizontal screening devices. (author)
Classification Methods for High-Dimensional Genetic Data
Czech Academy of Sciences Publication Activity Database
Kalina, Jan
2014-01-01
Roč. 34, č. 1 (2014), s. 10-18 ISSN 0208-5216 Institutional support: RVO:67985807 Keywords : multivariate statistics * classification analysis * shrinkage estimation * dimension reduction * data mining Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.646, year: 2014
Directory of Open Access Journals (Sweden)
S Anantha Sivaprakasam
2017-02-01
Full Text Available Cervical cancer that is a disease, in which malignant (cancer cells form in the tissues of the cervix, is one of the fourth leading causes of cancer death in female community worldwide. The cervical cancer can be prevented and/or cured if it is diagnosed in the pre-cancerous lesion stage or earlier. A common physical examination technique widely used in the screening is called Papanicolaou test or Pap test which is used to detect the abnormality of the cell. Due to intricacy of the cell nature, automating of this procedure is still a herculean task for the pathologist. This paper addresses solution for the challenges in terms of a simple and novel method to segment and classify the cervical cell automatically. The primary step of this procedure is pre-processing in which de-nosing, de-correlation operation and segregation of colour components are carried out, Then, two new techniques called Morphological and Statistical Edge based segmentation and Morphological and Statistical Region Based segmentation Techniques- put forward in this paper, and that are applied on the each component of image to segment the nuclei from cervical image. Finally, all segmented colour components are combined together to make a final segmentation result. After extracting the nuclei, the morphological features are extracted from the nuclei. The performance of two techniques mentioned above outperformed than standard segmentation techniques. Besides, Morphological and Statistical Edge based segmentation is outperformed than Morphological and Statistical Region based Segmentation. Finally, the nuclei are classified based on the morphological value The segmentation accuracy is echoed in classification accuracy. The overall segmentation accuracy is 97%.
Extension classification method for low-carbon product cases
Directory of Open Access Journals (Sweden)
Yanwei Zhao
2016-05-01
Full Text Available In product low-carbon design, intelligent decision systems integrated with certain classification algorithms recommend the existing design cases to designers. However, these systems mostly dependent on prior experience, and product designers not only expect to get a satisfactory case from an intelligent system but also hope to achieve assistance in modifying unsatisfactory cases. In this article, we proposed a new categorization method composed of static and dynamic classification based on extension theory. This classification method can be integrated into case-based reasoning system to get accurate classification results and to inform designers of detailed information about unsatisfactory cases. First, we establish the static classification model for cases by dependent function in a hierarchical structure. Then for dynamic classification, we make transformation for cases based on case model, attributes, attribute values, and dependent function, thus cases can take qualitative changes. Finally, the applicability of proposed method is demonstrated through a case study of screw air compressor cases.
Statistical classification of road pavements using near field vehicle rolling noise measurements.
Paulo, Joel Preto; Coelho, J L Bento; Figueiredo, Mário A T
2010-10-01
Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
2010-09-16
... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337, Hyattsville, Maryland 20782, e...
The ability of current statistical classifications to separate services and manufacturing
DEFF Research Database (Denmark)
Christensen, Jesper Lindgaard
The paper explores how well our statistical classification systems perform in classifying firms and in particular how they distinguish firms doing services and/or manufacturing. It is found that a large share, almost 20%, of firms can be said to be misclassified based on their statements on activ...
The statistical process control methods - SPC
Directory of Open Access Journals (Sweden)
Floreková Ľubica
1998-03-01
Full Text Available Methods of statistical evaluation of quality SPC (item 20 of the documentation system of quality control of ISO norm, series 900 of various processes, products and services belong amongst basic qualitative methods that enable us to analyse and compare data pertaining to various quantitative parameters. Also they enable, based on the latter, to propose suitable interventions with the aim of improving these processes, products and services. Theoretical basis and applicatibily of the principles of the: - diagnostics of a cause and effects, - Paret analysis and Lorentz curve, - number distribution and frequency curves of random variable distribution, - Shewhart regulation charts, are presented in the contribution.
Statistical methods towards more efficient infiltration measurements.
Franz, T; Krebs, P
2006-01-01
A comprehensive knowledge about the infiltration situation in a catchment is required for operation and maintenance. Due to the high expenditures, an optimisation of necessary measurement campaigns is essential. Methods based on multivariate statistics were developed to improve the information yield of measurements by identifying appropriate gauge locations. The methods have a high degree of freedom against data needs. They were successfully tested on real and artificial data. For suitable catchments, it is estimated that the optimisation potential amounts up to 30% accuracy improvement compared to nonoptimised gauge distributions. Beside this, a correlation between independent reach parameters and dependent infiltration rates could be identified, which is not dominated by the groundwater head.
Statistical trend analysis methods for temporal phenomena
International Nuclear Information System (INIS)
Lehtinen, E.; Pulkkinen, U.; Poern, K.
1997-04-01
We consider point events occurring in a random way in time. In many applications the pattern of occurrence is of intrinsic interest as indicating a trend or some other systematic feature in the rate of occurrence. The purpose of this report is to survey briefly different statistical trend analysis methods and illustrate their applicability to temporal phenomena in particular. The trend testing of point events is usually seen as the testing of the hypotheses concerning the intensity of the occurrence of events. When the intensity function is parametrized, the testing of trend is a typical parametric testing problem. In industrial applications the operational experience generally does not suggest any specified model and method in advance. Therefore, and particularly, if the Poisson process assumption is very questionable, it is desirable to apply tests that are valid for a wide variety of possible processes. The alternative approach for trend testing is to use some non-parametric procedure. In this report we have presented four non-parametric tests: The Cox-Stuart test, the Wilcoxon signed ranks test, the Mann test, and the exponential ordered scores test. In addition to the classical parametric and non-parametric approaches we have also considered the Bayesian trend analysis. First we discuss a Bayesian model, which is based on a power law intensity model. The Bayesian statistical inferences are based on the analysis of the posterior distribution of the trend parameters, and the probability of trend is immediately seen from these distributions. We applied some of the methods discussed in an example case. It should be noted, that this report is a feasibility study rather than a scientific evaluation of statistical methods, and the examples can only be seen as demonstrations of the methods
Statistical trend analysis methods for temporal phenomena
Energy Technology Data Exchange (ETDEWEB)
Lehtinen, E.; Pulkkinen, U. [VTT Automation, (Finland); Poern, K. [Poern Consulting, Nykoeping (Sweden)
1997-04-01
We consider point events occurring in a random way in time. In many applications the pattern of occurrence is of intrinsic interest as indicating a trend or some other systematic feature in the rate of occurrence. The purpose of this report is to survey briefly different statistical trend analysis methods and illustrate their applicability to temporal phenomena in particular. The trend testing of point events is usually seen as the testing of the hypotheses concerning the intensity of the occurrence of events. When the intensity function is parametrized, the testing of trend is a typical parametric testing problem. In industrial applications the operational experience generally does not suggest any specified model and method in advance. Therefore, and particularly, if the Poisson process assumption is very questionable, it is desirable to apply tests that are valid for a wide variety of possible processes. The alternative approach for trend testing is to use some non-parametric procedure. In this report we have presented four non-parametric tests: The Cox-Stuart test, the Wilcoxon signed ranks test, the Mann test, and the exponential ordered scores test. In addition to the classical parametric and non-parametric approaches we have also considered the Bayesian trend analysis. First we discuss a Bayesian model, which is based on a power law intensity model. The Bayesian statistical inferences are based on the analysis of the posterior distribution of the trend parameters, and the probability of trend is immediately seen from these distributions. We applied some of the methods discussed in an example case. It should be noted, that this report is a feasibility study rather than a scientific evaluation of statistical methods, and the examples can only be seen as demonstrations of the methods. 14 refs, 10 figs.
Wen, Hongwei; Liu, Yue; Wang, Jieqiong; Zhang, Jishui; Peng, Yun; He, Huiguang
2016-03-01
Tourette syndrome (TS) is a developmental neuropsychiatric disorder with the cardinal symptoms of motor and vocal tics which emerges in early childhood and fluctuates in severity in later years. To date, the neural basis of TS is not fully understood yet and TS has a long-term prognosis that is difficult to accurately estimate. Few studies have looked at the potential of using diffusion tensor imaging (DTI) in conjunction with machine learning algorithms in order to automate the classification of healthy children and TS children. Here we apply Tract-Based Spatial Statistics (TBSS) method to 44 TS children and 48 age and gender matched healthy children in order to extract the diffusion values from each voxel in the white matter (WM) skeleton, and a feature selection algorithm (ReliefF) was used to select the most salient voxels for subsequent classification with support vector machine (SVM). We use a nested cross validation to yield an unbiased assessment of the classification method and prevent overestimation. The accuracy (88.04%), sensitivity (88.64%) and specificity (87.50%) were achieved in our method as peak performance of the SVM classifier was achieved using the axial diffusion (AD) metric, demonstrating the potential of a joint TBSS and SVM pipeline for fast, objective classification of healthy and TS children. These results support that our methods may be useful for the early identification of subjects with TS, and hold promise for predicting prognosis and treatment outcome for individuals with TS.
Mathematical methods in quantum and statistical mechanics
International Nuclear Information System (INIS)
Fishman, L.
1977-01-01
The mathematical structure and closed-form solutions pertaining to several physical problems in quantum and statistical mechanics are examined in some detail. The J-matrix method, introduced previously for s-wave scattering and based upon well-established Hilbert Space theory and related generalized integral transformation techniques, is extended to treat the lth partial wave kinetic energy and Coulomb Hamiltonians within the context of square integrable (L 2 ), Laguerre (Slater), and oscillator (Gaussian) basis sets. The theory of relaxation in statistical mechanics within the context of the theory of linear integro-differential equations of the Master Equation type and their corresponding Markov processes is examined. Several topics of a mathematical nature concerning various computational aspects of the L 2 approach to quantum scattering theory are discussed
Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng
2007-01-01
The recent availability of low cost and miniaturized hardware has allowed wireless sensor networks (WSNs) to retrieve audio and video data in real world applications, which has fostered the development of wireless multimedia sensor networks (WMSNs). Resource constraints and challenging multimedia data volume make development of efficient algorithms to perform in-network processing of multimedia contents imperative. This paper proposes solving problems in the domain of WMSNs from the perspective of multi-agent systems. The multi-agent framework enables flexible network configuration and efficient collaborative in-network processing. The focus is placed on target classification in WMSNs where audio information is retrieved by microphones. To deal with the uncertainties related to audio information retrieval, the statistical approaches of power spectral density estimates, principal component analysis and Gaussian process classification are employed. A multi-agent negotiation mechanism is specially developed to efficiently utilize limited resources and simultaneously enhance classification accuracy and reliability. The negotiation is composed of two phases, where an auction based approach is first exploited to allocate the classification task among the agents and then individual agent decisions are combined by the committee decision mechanism. Simulation experiments with real world data are conducted and the results show that the proposed statistical approaches and negotiation mechanism not only reduce memory and computation requirements in WMSNs but also significantly enhance classification accuracy and reliability. PMID:28903223
A computer method for spectral classification
International Nuclear Information System (INIS)
Appenzeller, I.; Zekl, H.
1978-01-01
The authors describe the start of an attempt to improve the accuracy of spectroscopic parallaxes by evaluating spectroscopic temperature and luminosity criteria such as those of the MK classification spectrograms which were analyzed automatically by means of a suitable computer program. (Auth.)
Statistical methods of evaluating and comparing imaging techniques
International Nuclear Information System (INIS)
Freedman, L.S.
1987-01-01
Over the past 20 years several new methods of generating images of internal organs and the anatomy of the body have been developed and used to enhance the accuracy of diagnosis and treatment. These include ultrasonic scanning, radioisotope scanning, computerised X-ray tomography (CT) and magnetic resonance imaging (MRI). The new techniques have made a considerable impact on radiological practice in hospital departments, not least on the investigational process for patients suspected or known to have malignant disease. As a consequence of the increased range of imaging techniques now available, there has developed a need to evaluate and compare their usefulness. Over the past 10 years formal studies of the application of imaging technology have been conducted and many reports have appeared in the literature. These studies cover a range of clinical situations. Likewise, the methodologies employed for evaluating and comparing the techniques in question have differed widely. While not attempting an exhaustive review of the clinical studies which have been reported, this paper aims to examine the statistical designs and analyses which have been used. First a brief review of the different types of study is given. Examples of each type are then chosen to illustrate statistical issues related to their design and analysis. In the final sections it is argued that a form of classification for these different types of study might be helpful in clarifying relationships between them and bringing a perspective to the field. A classification based upon a limited analogy with clinical trials is suggested
Statistical methods for assessment of blend homogeneity
DEFF Research Database (Denmark)
Madsen, Camilla
2002-01-01
In this thesis the use of various statistical methods to address some of the problems related to assessment of the homogeneity of powder blends in tablet production is discussed. It is not straight forward to assess the homogeneity of a powder blend. The reason is partly that in bulk materials......, it is shown how to set up parametric acceptance criteria for the batch that gives a high confidence that future samples with a probability larger than a specified value will pass the USP threeclass criteria. Properties and robustness of proposed changes to the USP test for content uniformity are investigated...
A method for classification of network traffic based on C5.0 Machine Learning Algorithm
DEFF Research Database (Denmark)
Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup
2012-01-01
current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...
A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem
Directory of Open Access Journals (Sweden)
Zekić-Sušac Marijana
2014-09-01
Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.
Identifying User Profiles from Statistical Grouping Methods
Directory of Open Access Journals (Sweden)
Francisco Kelsen de Oliveira
2018-02-01
Full Text Available This research aimed to group users into subgroups according to their levels of knowledge about technology. Statistical hierarchical and non-hierarchical clustering methods were studied, compared and used in the creations of the subgroups from the similarities of the skill levels with these users’ technology. The research sample consisted of teachers who answered online questionnaires about their skills with the use of software and hardware with educational bias. The statistical methods of grouping were performed and showed the possibilities of groupings of the users. The analyses of these groups allowed to identify the common characteristics among the individuals of each subgroup. Therefore, it was possible to define two subgroups of users, one with skill in technology and another with skill with technology, so that the partial results of the research showed two main algorithms for grouping with 92% similarity in the formation of groups of users with skill with technology and the other with little skill, confirming the accuracy of the techniques of discrimination against individuals.
Statistical sampling method for releasing decontaminated vehicles
International Nuclear Information System (INIS)
Lively, J.W.; Ware, J.A.
1996-01-01
Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site
AN OBJECT-BASED METHOD FOR CHINESE LANDFORM TYPES CLASSIFICATION
Directory of Open Access Journals (Sweden)
H. Ding
2016-06-01
Full Text Available Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM. In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.
Statistical Software for State Space Methods
Directory of Open Access Journals (Sweden)
Jacques J. F. Commandeur
2011-05-01
Full Text Available In this paper we review the state space approach to time series analysis and establish the notation that is adopted in this special volume of the Journal of Statistical Software. We first provide some background on the history of state space methods for the analysis of time series. This is followed by a concise overview of linear Gaussian state space analysis including the modelling framework and appropriate estimation methods. We discuss the important class of unobserved component models which incorporate a trend, a seasonal, a cycle, and fixed explanatory and intervention variables for the univariate and multivariate analysis of time series. We continue the discussion by presenting methods for the computation of different estimates for the unobserved state vector: filtering, prediction, and smoothing. Estimation approaches for the other parameters in the model are also considered. Next, we discuss how the estimation procedures can be used for constructing confidence intervals, detecting outlier observations and structural breaks, and testing model assumptions of residual independence, homoscedasticity, and normality. We then show how ARIMA and ARIMA components models fit in the state space framework to time series analysis. We also provide a basic introduction for non-Gaussian state space models. Finally, we present an overview of the software tools currently available for the analysis of time series with state space methods as they are discussed in the other contributions to this special volume.
Directory of Open Access Journals (Sweden)
R. Jegadeeshwaran
2015-03-01
Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.
Application of pedagogy reflective in statistical methods course and practicum statistical methods
Julie, Hongki
2017-08-01
Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.
PROGRESSIVE DENSIFICATION AND REGION GROWING METHODS FOR LIDAR DATA CLASSIFICATION
Directory of Open Access Journals (Sweden)
J. L. Pérez-García
2012-07-01
Full Text Available At present, airborne laser scanner systems are one of the most frequent methods used to obtain digital terrain elevation models. While having the advantage of direct measurement on the object, the point cloud obtained has the need for classification of their points according to its belonging to the ground. This need for classification of raw data has led to appearance of multiple filters focused LiDAR classification information. According this approach, this paper presents a classification method that combines LiDAR data segmentation techniques and progressive densification to carry out the location of the points belonging to the ground. The proposed methodology is tested on several datasets with different terrain characteristics and data availability. In all case, we analyze the advantages and disadvantages that have been obtained compared with the individual techniques application and, in a special way, the benefits derived from the integration of both classification techniques. In order to provide a more comprehensive quality control of the classification process, the obtained results have been compared with the derived from a manual procedure, which is used as reference classification. The results are also compared with other automatic classification methodologies included in some commercial software packages, highly contrasted by users for LiDAR data treatment.
The Monte Carlo method the method of statistical trials
Shreider, YuA
1966-01-01
The Monte Carlo Method: The Method of Statistical Trials is a systematic account of the fundamental concepts and techniques of the Monte Carlo method, together with its range of applications. Some of these applications include the computation of definite integrals, neutron physics, and in the investigation of servicing processes. This volume is comprised of seven chapters and begins with an overview of the basic features of the Monte Carlo method and typical examples of its application to simple problems in computational mathematics. The next chapter examines the computation of multi-dimensio
On two methods of statistical image analysis
Missimer, J; Knorr, U; Maguire, RP; Herzog, H; Seitz, RJ; Tellman, L; Leenders, K.L.
1999-01-01
The computerized brain atlas (CBA) and statistical parametric mapping (SPM) are two procedures for voxel-based statistical evaluation of PET activation studies. Each includes spatial standardization of image volumes, computation of a statistic, and evaluation of its significance. In addition,
A New Method for Solving Supervised Data Classification Problems
Directory of Open Access Journals (Sweden)
Parvaneh Shabanzadeh
2014-01-01
Full Text Available Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.
Statistical methods for astronomical data analysis
Chattopadhyay, Asis Kumar
2014-01-01
This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...
Seasonal UK Drought Forecasting using Statistical Methods
Richardson, Doug; Fowler, Hayley; Kilsby, Chris; Serinaldi, Francesco
2016-04-01
In the UK drought is a recurrent feature of climate with potentially large impacts on public water supply. Water companies' ability to mitigate the impacts of drought by managing diminishing availability depends on forward planning and it would be extremely valuable to improve forecasts of drought on monthly to seasonal time scales. By focusing on statistical forecasting methods, this research aims to provide techniques that are simpler, faster and computationally cheaper than physically based models. In general, statistical forecasting is done by relating the variable of interest (some hydro-meteorological variable such as rainfall or streamflow, or a drought index) to one or more predictors via some formal dependence. These predictors are generally antecedent values of the response variable or external factors such as teleconnections. A candidate model is Generalised Additive Models for Location, Scale and Shape parameters (GAMLSS). GAMLSS is a very flexible class allowing for more general distribution functions (e.g. highly skewed and/or kurtotic distributions) and the modelling of not just the location parameter but also the scale and shape parameters. Additionally GAMLSS permits the forecasting of an entire distribution, allowing the output to be assessed in probabilistic terms rather than simply the mean and confidence intervals. Exploratory analysis of the relationship between long-memory processes (e.g. large-scale atmospheric circulation patterns, sea surface temperatures and soil moisture content) and drought should result in the identification of suitable predictors to be included in the forecasting model, and further our understanding of the drivers of UK drought.
Tweet-based Target Market Classification Using Ensemble Method
Directory of Open Access Journals (Sweden)
Muhammad Adi Khairul Anshary
2016-09-01
Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.
Directory of Open Access Journals (Sweden)
Sheng Wang
2007-10-01
Full Text Available The recent availability of low cost and miniaturized hardware has allowedwireless sensor networks (WSNs to retrieve audio and video data in real worldapplications, which has fostered the development of wireless multimedia sensor networks(WMSNs. Resource constraints and challenging multimedia data volume makedevelopment of efficient algorithms to perform in-network processing of multimediacontents imperative. This paper proposes solving problems in the domain of WMSNs fromthe perspective of multi-agent systems. The multi-agent framework enables flexible networkconfiguration and efficient collaborative in-network processing. The focus is placed ontarget classification in WMSNs where audio information is retrieved by microphones. Todeal with the uncertainties related to audio information retrieval, the statistical approachesof power spectral density estimates, principal component analysis and Gaussian processclassification are employed. A multi-agent negotiation mechanism is specially developed toefficiently utilize limited resources and simultaneously enhance classification accuracy andreliability. The negotiation is composed of two phases, where an auction based approach isfirst exploited to allocate the classification task among the agents and then individual agentdecisions are combined by the committee decision mechanism. Simulation experiments withreal world data are conducted and the results show that the proposed statistical approachesand negotiation mechanism not only reduce memory and computation requi
Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K
2012-04-01
The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
Classification of astrocyto-mas and meningiomas using statistical discriminant analysis on MRI data
International Nuclear Information System (INIS)
Siromoney, Anna; Prasad, G.N.S.; Raghuram, Lakshminarayan; Korah, Ipeson; Siromoney, Arul; Chandrasekaran, R.
2001-01-01
The objective of this study was to investigate the usefulness of Multivariate Discriminant Analysis for classifying two groups of primary brain tumours, astrocytomas and meningiomas, from Magnetic Resonance Images. Discriminant analysis is a multivariate technique concerned with separating distinct sets of objects and with allocating new objects to previously defined groups. Allocation or classification rules are usually developed from learning examples in a supervised learning environment. Data from signal intensity measurements in the multiple scan performed on each patient in routine clinical scanning was analysed using Fisher's Classification, which is one method of discriminant analysis
Directory of Open Access Journals (Sweden)
J. Sunil Rao
2007-01-01
Full Text Available In gene selection for cancer classifi cation using microarray data, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalueratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance.
Leydesdorff, Loet; Kogler, Dieter Franz; Yan, Bowen
2017-01-01
The Cooperative Patent Classifications (CPC) recently developed cooperatively by the European and US Patent Offices provide a new basis for mapping patents and portfolio analysis. CPC replaces International Patent Classifications (IPC) of the World Intellectual Property Organization. In this study, we update our routines previously based on IPC for CPC and use the occasion for rethinking various parameter choices. The new maps are significantly different from the previous ones, although this may not always be obvious on visual inspection. We provide nested maps online and a routine for generating portfolio overlays on the maps; a new tool is provided for "difference maps" between patent portfolios of organizations or firms. This is illustrated by comparing the portfolios of patents granted to two competing firms-Novartis and MSD-in 2016. Furthermore, the data is organized for the purpose of statistical analysis.
Highly Robust Statistical Methods in Medical Image Analysis
Czech Academy of Sciences Publication Activity Database
Kalina, Jan
2012-01-01
Roč. 32, č. 2 (2012), s. 3-16 ISSN 0208-5216 R&D Projects: GA MŠk(CZ) 1M06014 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust statistics * classification * faces * robust image analysis * forensic science Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.208, year: 2012 http://www.ibib.waw.pl/bbe/bbefulltext/BBE_32_2_003_FT.pdf
Directory of Open Access Journals (Sweden)
Stanislawski Jerzy
2013-01-01
Full Text Available Abstract Background Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%. The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile to 0.5 CPU-hours (simplified 3D profile to seconds (machine learning. Conclusions We showed that the simplified profile generation method does not introduce an error with regard to the original method, while
Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd
2013-01-17
Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset
Advanced Steel Microstructural Classification by Deep Learning Methods.
Azimi, Seyed Majid; Britz, Dominik; Engstler, Michael; Fritz, Mario; Mücklich, Frank
2018-02-01
The inner structure of a material is called microstructure. It stores the genesis of a material and determines all its physical and chemical properties. While microstructural characterization is widely spread and well known, the microstructural classification is mostly done manually by human experts, which gives rise to uncertainties due to subjectivity. Since the microstructure could be a combination of different phases or constituents with complex substructures its automatic classification is very challenging and only a few prior studies exist. Prior works focused on designed and engineered features by experts and classified microstructures separately from the feature extraction step. Recently, Deep Learning methods have shown strong performance in vision applications by learning the features from data together with the classification step. In this work, we propose a Deep Learning method for microstructural classification in the examples of certain microstructural constituents of low carbon steel. This novel method employs pixel-wise segmentation via Fully Convolutional Neural Network (FCNN) accompanied by a max-voting scheme. Our system achieves 93.94% classification accuracy, drastically outperforming the state-of-the-art method of 48.89% accuracy. Beyond the strong performance of our method, this line of research offers a more robust and first of all objective way for the difficult task of steel quality appreciation.
International Nuclear Information System (INIS)
Vaganov, P.A.; Kol'tsov, A.A.; Kulikov, V.D.; Mejer, V.A.
1983-01-01
The multi-element instrumental neutron activation analysis of samples of mountain rocks (sandstones, aleurolites and shales of one of gold deposits) is performed. The spectra of irradiated samples are measured by Ge(Li) detector of the volume of 35 mm 3 . The content of 22 chemical elements is determined in each sample. The results of analysis serve as reliable basis for multi-dimensional statistic information processing, they constitute the basis for the generalized characteristics of rocks which brings about the solution of classification problem for rocks of different deposits
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo
2018-06-05
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Oromucosal film preparations: classification and characterization methods.
Preis, Maren; Woertz, Christina; Kleinebudde, Peter; Breitkreutz, Jörg
2013-09-01
Recently, the regulatory authorities have enlarged the variety of 'oromucosal preparations' by buccal films and orodispersible films. Various film preparations have entered the market and pharmacopoeias. Due to the novelty of the official monographs, no standardized characterization methods and quality specifications are included. This review reports the methods of choice to characterize oromucosal film preparations with respect to biorelevant characterization and quality control. Commonly used dissolution tests for other dosage forms are not transferable for films in all cases. Alternatives and guidance on decision, which methods are favorable for film preparations are discussed. Furthermore, issues about requirements for film dosage forms are reflected. Oromucosal film preparations offer a wide spectrum of opportunities. There are a lot of suggestions in the literature on how to control the quality of these innovative products, but no standardized tests are available. Regulatory authorities need to define the standards and quality requirements more precisely.
Fast linear method of illumination classification
Cooper, Ted J.; Baqai, Farhan A.
2003-01-01
We present a simple method for estimating the scene illuminant for images obtained by a Digital Still Camera (DSC). The proposed method utilizes basis vectors obtained from known memory color reflectance to identify the memory color objects in the image. Once the memory color pixels are identified, we use the ratios of the red/green and blue/green to determine the most likely illuminant in the image. The critical part of the method is to estimate the smallest set of basis vectors that closely represent the memory color reflectances. Basis vectors obtained from both Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are used. We will show that only two ICA basis vectors are needed to get an acceptable estimate.
An Efficient Ensemble Learning Method for Gene Microarray Classification
Directory of Open Access Journals (Sweden)
Alireza Osareh
2013-01-01
Full Text Available The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data
Abusamra, Heba
2013-05-01
Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in
Statistical learning methods in high-energy and astrophysics analysis
Energy Technology Data Exchange (ETDEWEB)
Zimmermann, J. [Forschungszentrum Juelich GmbH, Zentrallabor fuer Elektronik, 52425 Juelich (Germany) and Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de; Kiesling, C. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)
2004-11-21
We discuss several popular statistical learning methods used in high-energy- and astro-physics analysis. After a short motivation for statistical learning we present the most popular algorithms and discuss several examples from current research in particle- and astro-physics. The statistical learning methods are compared with each other and with standard methods for the respective application.
Development of a Research Methods and Statistics Concept Inventory
Veilleux, Jennifer C.; Chapman, Kate M.
2017-01-01
Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…
Statistical learning methods in high-energy and astrophysics analysis
International Nuclear Information System (INIS)
Zimmermann, J.; Kiesling, C.
2004-01-01
We discuss several popular statistical learning methods used in high-energy- and astro-physics analysis. After a short motivation for statistical learning we present the most popular algorithms and discuss several examples from current research in particle- and astro-physics. The statistical learning methods are compared with each other and with standard methods for the respective application
Classification of Patients Treated for Infertility Using the IVF Method
Directory of Open Access Journals (Sweden)
Malinowski Paweł
2015-12-01
Full Text Available One of the most effective methods of infertility treatment is in vitro fertilization (IVF. Effectiveness of the treatment, as well as classification of the data obtained from it, is still an ongoing issue. Classifiers obtained so far are powerful, but even the best ones do not exhibit equal quality concerning possible treatment outcome predictions. Usually, lack of pregnancy is predicted far too often. This creates a constant need for further exploration of this issue. Careful use of different classification methods can, however, help to achieve that goal.
Machine-learning methods in the classification of water bodies
Directory of Open Access Journals (Sweden)
Sołtysiak Marek
2016-06-01
Full Text Available Amphibian species have been considered as useful ecological indicators. They are used as indicators of environmental contamination, ecosystem health and habitat quality., Amphibian species are sensitive to changes in the aquatic environment and therefore, may form the basis for the classification of water bodies. Water bodies in which there are a large number of amphibian species are especially valuable even if they are located in urban areas. The automation of the classification process allows for a faster evaluation of the presence of amphibian species in the water bodies. Three machine-learning methods (artificial neural networks, decision trees and the k-nearest neighbours algorithm have been used to classify water bodies in Chorzów – one of 19 cities in the Upper Silesia Agglomeration. In this case, classification is a supervised data mining method consisting of several stages such as building the model, the testing phase and the prediction. Seven natural and anthropogenic features of water bodies (e.g. the type of water body, aquatic plants, the purpose of the water body (destination, position of the water body in relation to any possible buildings, condition of the water body, the degree of littering, the shore type and fishing activities have been taken into account in the classification. The data set used in this study involved information about 71 different water bodies and 9 amphibian species living in them. The results showed that the best average classification accuracy was obtained with the multilayer perceptron neural network.
Brenčič, Mihael
2016-01-01
Northern hemisphere elementary circulation mechanisms, defined with the Dzerdzeevski classification and published on a daily basis from 1899-2012, are analysed with statistical methods as continuous categorical time series. Classification consists of 41 elementary circulation mechanisms (ECM), which are assigned to calendar days. Empirical marginal probabilities of each ECM were determined. Seasonality and the periodicity effect were investigated with moving dispersion filters and randomisation procedure on the ECM categories as well as with the time analyses of the ECM mode. The time series were determined as being non-stationary with strong time-dependent trends. During the investigated period, periodicity interchanges with periods when no seasonality is present. In the time series structure, the strongest division is visible at the milestone of 1986, showing that the atmospheric circulation pattern reflected in the ECM has significantly changed. This change is result of the change in the frequency of ECM categories; before 1986, the appearance of ECM was more diverse, and afterwards fewer ECMs appear. The statistical approach applied to the categorical climatic time series opens up new potential insight into climate variability and change studies that have to be performed in the future.
Statistical models and methods for reliability and survival analysis
Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo
2013-01-01
Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical
Identification of AE Bursts by Classification of Physical and Statistical Parameters
International Nuclear Information System (INIS)
Mieza, J.I.; Oliveto, M.E.; Lopez Pumarega, M.I.; Armeite, M.; Ruzzante, J.E.; Piotrkowski, R.
2005-01-01
Physical and statistical parameters obtained with the Principal Components method, extracted from Acoustic Emission bursts coming from triaxial deformation tests were analyzed. The samples came from seamless steel tubes used in the petroleum industry and some of them were provided with a protective coating. The purpose of our work was to identify bursts originated in the breakage of the coating, from those originated in damage mechanisms in the bulk steel matrix. Analysis was performed by statistical distributions, fractal analysis and clustering methods
Assessment of statistical methods used in library-based approaches to microbial source tracking.
Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D
2003-12-01
Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Statistical methods of estimating mining costs
Long, K.R.
2011-01-01
Until it was defunded in 1995, the U.S. Bureau of Mines maintained a Cost Estimating System (CES) for prefeasibility-type economic evaluations of mineral deposits and estimating costs at producing and non-producing mines. This system had a significant role in mineral resource assessments to estimate costs of developing and operating known mineral deposits and predicted undiscovered deposits. For legal reasons, the U.S. Geological Survey cannot update and maintain CES. Instead, statistical tools are under development to estimate mining costs from basic properties of mineral deposits such as tonnage, grade, mineralogy, depth, strip ratio, distance from infrastructure, rock strength, and work index. The first step was to reestimate "Taylor's Rule" which relates operating rate to available ore tonnage. The second step was to estimate statistical models of capital and operating costs for open pit porphyry copper mines with flotation concentrators. For a sample of 27 proposed porphyry copper projects, capital costs can be estimated from three variables: mineral processing rate, strip ratio, and distance from nearest railroad before mine construction began. Of all the variables tested, operating costs were found to be significantly correlated only with strip ratio.
Measuring methods and classification in the muscoskeletal radiology
International Nuclear Information System (INIS)
Waldt, Simone; Eiber, Matthias; Woertler, Klaus
2011-01-01
The book on measuring methods and classification in the musculoskeletal radiology covers the following topics: legs; hip joint; knee joint; foot; shoulder joint; elbow joint; wrist joint; spinal column; craniocervical transition region and cervical spine; muscular-skeletal carcinomas; osteoporosis; arthrosis; articular cartilage; hemophilia; rheumatic arthritis; muscular injuries; skeleton age.
a Hyperspectral Image Classification Method Using Isomap and Rvm
Chang, H.; Wang, T.; Fang, H.; Su, Y.
2018-04-01
Classification is one of the most significant applications of hyperspectral image processing and even remote sensing. Though various algorithms have been proposed to implement and improve this application, there are still drawbacks in traditional classification methods. Thus further investigations on some aspects, such as dimension reduction, data mining, and rational use of spatial information, should be developed. In this paper, we used a widely utilized global manifold learning approach, isometric feature mapping (ISOMAP), to address the intrinsic nonlinearities of hyperspectral image for dimension reduction. Considering the impropriety of Euclidean distance in spectral measurement, we applied spectral angle (SA) for substitute when constructed the neighbourhood graph. Then, relevance vector machines (RVM) was introduced to implement classification instead of support vector machines (SVM) for simplicity, generalization and sparsity. Therefore, a probability result could be obtained rather than a less convincing binary result. Moreover, taking into account the spatial information of the hyperspectral image, we employ a spatial vector formed by different classes' ratios around the pixel. At last, we combined the probability results and spatial factors with a criterion to decide the final classification result. To verify the proposed method, we have implemented multiple experiments with standard hyperspectral images compared with some other methods. The results and different evaluation indexes illustrated the effectiveness of our method.
Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.
2014-05-01
Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.
Innovative statistical methods for public health data
Wilson, Jeffrey
2015-01-01
The book brings together experts working in public health and multi-disciplinary areas to present recent issues in statistical methodological development and their applications. This timely book will impact model development and data analyses of public health research across a wide spectrum of analysis. Data and software used in the studies are available for the reader to replicate the models and outcomes. The fifteen chapters range in focus from techniques for dealing with missing data with Bayesian estimation, health surveillance and population definition and implications in applied latent class analysis, to multiple comparison and meta-analysis in public health data. Researchers in biomedical and public health research will find this book to be a useful reference, and it can be used in graduate level classes.
Methods of contemporary mathematical statistical physics
2009-01-01
This volume presents a collection of courses introducing the reader to the recent progress with attention being paid to laying solid grounds and developing various basic tools. An introductory chapter on lattice spin models is useful as a background for other lectures of the collection. The topics include new results on phase transitions for gradient lattice models (with introduction to the techniques of the reflection positivity), stochastic geometry reformulation of classical and quantum Ising models, the localization/delocalization transition for directed polymers. A general rigorous framework for theory of metastability is presented and particular applications in the context of Glauber and Kawasaki dynamics of lattice models are discussed. A pedagogical account of several recently discussed topics in nonequilibrium statistical mechanics with an emphasis on general principles is followed by a discussion of kinetically constrained spin models that are reflecting important peculiar features of glassy dynamic...
MSD Recombination Method in Statistical Machine Translation
Gros, Jerneja Žganec
2008-11-01
Freely available tools and language resources were used to build the VoiceTRAN statistical machine translation (SMT) system. Various configuration variations of the system are presented and evaluated. The VoiceTRAN SMT system outperformed the baseline conventional rule-based MT system in all English-Slovenian in-domain test setups. To further increase the generalization capability of the translation model for lower-coverage out-of-domain test sentences, an "MSD-recombination" approach was proposed. This approach not only allows a better exploitation of conventional translation models, but also performs well in the more demanding translation direction; that is, into a highly inflectional language. Using this approach in the out-of-domain setup of the English-Slovenian JRC-ACQUIS task, we have achieved significant improvements in translation quality.
METHODOLOGICAL PRINCIPLES AND METHODS OF TERMS OF TRADE STATISTICAL EVALUATION
Directory of Open Access Journals (Sweden)
N. Kovtun
2014-09-01
Full Text Available The paper studies the methodological principles and guidance of the statistical evaluation of terms of trade for the United Nations classification model – Harmonized Commodity Description and Coding System (HS. The practical implementation of the proposed three-stage model of index analysis and estimation of terms of trade for Ukraine's commodity-members for the period of 2011-2012 are realized.
NIM: A Node Influence Based Method for Cancer Classification
Directory of Open Access Journals (Sweden)
Yiwen Wang
2014-01-01
Full Text Available The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.
Coding and classification in drug statistics – From national to global application
Directory of Open Access Journals (Sweden)
Marit Rønning
2009-11-01
Full Text Available SUMMARYThe Anatomical Therapeutic Chemical (ATC classification system and the defined daily dose (DDDwas developed in Norway in the early seventies. The creation of the ATC/DDD methodology was animportant basis for presenting drug utilisation statistics in a sensible way. Norway was in 1977 also thefirst country to publish national drug utilisation statistics from wholesalers on an annual basis. Thecombination of these activities in Norway in the seventies made us a pioneer country in the area of drugutilisation research. Over the years, the use of the ATC/DDD methodology has gradually increased incountries outside Norway. Since 1996, the methodology has been recommended by WHO for use ininternational drug utilisation studies. The WHO Collaborating Centre for Drug Statistics Methodologyin Oslo handles the maintenance and development of the ATC/DDD system. The Centre is now responsiblefor the global co-ordination. After nearly 30 years of experience with ATC/DDD, the methodologyhas demonstrated its suitability in drug use research. The main challenge in the coming years is toeducate the users worldwide in how to use the methodology properly.
Statistical methods for categorical data analysis
Powers, Daniel
2008-01-01
This book provides a comprehensive introduction to methods and models for categorical data analysis and their applications in social science research. Companion website also available, at https://webspace.utexas.edu/dpowers/www/
A DATA FIELD METHOD FOR URBAN REMOTELY SENSED IMAGERY CLASSIFICATION CONSIDERING SPATIAL CORRELATION
Directory of Open Access Journals (Sweden)
Y. Zhang
2016-06-01
Full Text Available Spatial correlation between pixels is important information for remotely sensed imagery classification. Data field method and spatial autocorrelation statistics have been utilized to describe and model spatial information of local pixels. The original data field method can represent the spatial interactions of neighbourhood pixels effectively. However, its focus on measuring the grey level change between the central pixel and the neighbourhood pixels results in exaggerating the contribution of the central pixel to the whole local window. Besides, Geary’s C has also been proven to well characterise and qualify the spatial correlation between each pixel and its neighbourhood pixels. But the extracted object is badly delineated with the distracting salt-and-pepper effect of isolated misclassified pixels. To correct this defect, we introduce the data field method for filtering and noise limitation. Moreover, the original data field method is enhanced by considering each pixel in the window as the central pixel to compute statistical characteristics between it and its neighbourhood pixels. The last step employs a support vector machine (SVM for the classification of multi-features (e.g. the spectral feature and spatial correlation feature. In order to validate the effectiveness of the developed method, experiments are conducted on different remotely sensed images containing multiple complex object classes inside. The results show that the developed method outperforms the traditional method in terms of classification accuracies.
Abusamra, Heba
2013-11-01
Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.
Abusamra, Heba
2013-01-01
Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.
Fruit Detachment and Classification Method for Strawberry Harvesting Robot
Directory of Open Access Journals (Sweden)
Guo Feng
2008-11-01
Full Text Available Fruit detachment and on-line classification is important for the development of harvesting robot. With the specific requriements of robot used for harvesting strawberries growing on the ground, a fruit detachment and classification method is introduced in this paper. OHTA color spaces based image segmentation algorithm is utilized to extract strawberry from background; Principal inertia axis of binary strawberry blob is calculated to give the pose information of fruit. Strawberry is picked selectively according to its ripeness and classified according to its shape feature. Histogram matching based method for fruit shape judgment is introduced firstly. Experiment results show that this method can achieve 93% accuracy of strawberry's stem detection, 90% above accuracy of ripeness and shape quality judgment on black and white background. With the improvement of harvesting mechanism design, this method has application potential in the field operation.
Fruit Detachment and Classification Method for Strawberry Harvesting Robot
Directory of Open Access Journals (Sweden)
Guo Feng
2008-03-01
Full Text Available Fruit detachment and on-line classification is important for the development of harvesting robot. With the specific requriements of robot used for harvesting strawberries growing on the ground, a fruit detachment and classification method is introduced in this paper. OHTA color spaces based image segmentation algorithm is utilized to extract strawberry from background; Principal inertia axis of binary strawberry blob is calculated to give the pose information of fruit. Strawberry is picked selectively according to its ripeness and classified according to its shape feature. Histogram matching based method for fruit shape judgment is introduced firstly. Experiment results show that this method can achieve 93% accuracy of strawberry's stem detection, 90% above accuracy of ripeness and shape quality judgment on black and white background. With the improvement of harvesting mechanism design, this method has application potential in the field operation.
Statistical-mechanics analysis of Gaussian labeled-unlabeled classification problems
International Nuclear Information System (INIS)
Tanaka, Toshiyuki
2013-01-01
The labeled-unlabeled classification problem in semi-supervised learning is studied via statistical-mechanics approach. We analytically investigate performance of a learner with an equal-weight mixture of two symmetrically-located Gaussians, performing posterior mean estimation of the parameter vector on the basis of a dataset consisting of labeled and unlabeled data generated from the same probability model as that assumed by the learner. Under the assumption of replica symmetry, we have analytically obtained a set of saddle-point equations, which allows us to numerically evaluate performance of the learner. On the basis of the analytical result we have observed interesting phenomena, in particular the coexistence of good and bad solutions, which may happen when the number of unlabeled data is relatively large compared with that of labeled data
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Statistical methods and computing for big data.
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing; Yan, Jun
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Simple statistical methods for software engineering data and patterns
Pandian, C Ravindranath
2015-01-01
Although there are countless books on statistics, few are dedicated to the application of statistical methods to software engineering. Simple Statistical Methods for Software Engineering: Data and Patterns fills that void. Instead of delving into overly complex statistics, the book details simpler solutions that are just as effective and connect with the intuition of problem solvers.Sharing valuable insights into software engineering problems and solutions, the book not only explains the required statistical methods, but also provides many examples, review questions, and case studies that prov
Cratering statistics on asteroids: Methods and perspectives
Chapman, C.
2014-07-01
Crater size-frequency distributions (SFDs) on the surfaces of solid-surfaced bodies in the solar system have provided valuable insights about planetary surface processes and about impactor populations since the first spacecraft images were obtained in the 1960s. They can be used to determine relative age differences between surficial units, to obtain absolute model ages if the impactor flux and scaling laws are understood, to assess various endogenic planetary or asteroidal processes that degrade craters or resurface units, as well as assess changes in impactor populations across the solar system and/or with time. The first asteroid SFDs were measured from Galileo images of Gaspra and Ida (cf., Chapman 2002). Despite the superficial simplicity of these studies, they are fraught with many difficulties, including confusion by secondary and/or endogenic cratering and poorly understood aspects of varying target properties (including regoliths, ejecta blankets, and nearly-zero-g rubble piles), widely varying attributes of impactors, and a host of methodological problems including recognizability of degraded craters, which is affected by illumination angle and by the ''personal equations'' of analysts. Indeed, controlled studies (Robbins et al. 2014) demonstrate crater-density differences of a factor of two or more between experienced crater counters. These inherent difficulties have been especially apparent in divergent results for Vesta from different members of the Dawn Science Team (cf. Russell et al. 2013). Indeed, they have been exacerbated by misuse of a widely available tool (Craterstats: hrscview.fu- berlin.de/craterstats.html), which incorrectly computes error bars for proper interpretation of cumulative SFDs, resulting in derived model ages specified to three significant figures and interpretations of statistically insignificant kinks. They are further exacerbated, and for other small-body crater SFDs analyzed by the Berlin group, by stubbornly adopting
Ringard, Justine; Seyler, Frederique; Linguet, Laurent
2017-06-16
Satellite precipitation products (SPPs) provide alternative precipitation data for regions with sparse rain gauge measurements. However, SPPs are subject to different types of error that need correction. Most SPP bias correction methods use the statistical properties of the rain gauge data to adjust the corresponding SPP data. The statistical adjustment does not make it possible to correct the pixels of SPP data for which there is no rain gauge data. The solution proposed in this article is to correct the daily SPP data for the Guiana Shield using a novel two set approach, without taking into account the daily gauge data of the pixel to be corrected, but the daily gauge data from surrounding pixels. In this case, a spatial analysis must be involved. The first step defines hydroclimatic areas using a spatial classification that considers precipitation data with the same temporal distributions. The second step uses the Quantile Mapping bias correction method to correct the daily SPP data contained within each hydroclimatic area. We validate the results by comparing the corrected SPP data and daily rain gauge measurements using relative RMSE and relative bias statistical errors. The results show that analysis scale variation reduces rBIAS and rRMSE significantly. The spatial classification avoids mixing rainfall data with different temporal characteristics in each hydroclimatic area, and the defined bias correction parameters are more realistic and appropriate. This study demonstrates that hydroclimatic classification is relevant for implementing bias correction methods at the local scale.
Statistical methods for handling incomplete data
Kim, Jae Kwang
2013-01-01
""… this book nicely blends the theoretical material and its application through examples, and will be of interest to students and researchers as a textbook or a reference book. Extensive coverage of recent advances in handling missing data provides resources and guidelines for researchers and practitioners in implementing the methods in new settings. … I plan to use this as a textbook for my teaching and highly recommend it.""-Biometrics, September 2014
Effects of Feature Extraction and Classification Methods on Cyberbully Detection
Directory of Open Access Journals (Sweden)
Esra SARAÇ
2016-12-01
Full Text Available Cyberbullying is defined as an aggressive, intentional action against a defenseless person by using the Internet, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended in suicides; hence automatic detection of cyberbullying has become important. In this study we show the effects of feature extraction, feature selection, and classification methods that are used, on the performance of automatic detection of cyberbullying. To perform the experiments FormSpring.me dataset is used and the effects of preprocessing methods; several classifiers like C4.5, Naïve Bayes, kNN, and SVM; and information gain and chi square feature selection methods are investigated. Experimental results indicate that the best classification results are obtained when alphabetic tokenization, no stemming, and no stopwords removal are applied. Using feature selection also improves cyberbully detection performance. When classifiers are compared, C4.5 performs the best for the used dataset.
DEFF Research Database (Denmark)
Wang, Ting; Guan, Sheng-Uei; Puthusserypady, Sadasivan
2014-01-01
Feature ordering is a significant data preprocessing method in Incremental Attribute Learning (IAL), a novel machine learning approach which gradually trains features according to a given order. Previous research has shown that, similar to feature selection, feature ordering is also important based...... estimation. Moreover, a criterion that summarizes all the produced values of AD is employed with a GA (Genetic Algorithm)-based approach to obtain the optimum feature ordering for classification problems based on neural networks by means of IAL. Compared with the feature ordering obtained by other approaches...
A hierarchical classification method for finger knuckle print recognition
Kong, Tao; Yang, Gongping; Yang, Lu
2014-12-01
Finger knuckle print has recently been seen as an effective biometric technique. In this paper, we propose a hierarchical classification method for finger knuckle print recognition, which is rooted in traditional score-level fusion methods. In the proposed method, we firstly take Gabor feature as the basic feature for finger knuckle print recognition and then a new decision rule is defined based on the predefined threshold. Finally, the minor feature speeded-up robust feature is conducted for these users, who cannot be recognized by the basic feature. Extensive experiments are performed to evaluate the proposed method, and experimental results show that it can achieve a promising performance.
Estimation of Lithological Classification in Taipei Basin: A Bayesian Maximum Entropy Method
Wu, Meng-Ting; Lin, Yuan-Chien; Yu, Hwa-Lung
2015-04-01
In environmental or other scientific applications, we must have a certain understanding of geological lithological composition. Because of restrictions of real conditions, only limited amount of data can be acquired. To find out the lithological distribution in the study area, many spatial statistical methods used to estimate the lithological composition on unsampled points or grids. This study applied the Bayesian Maximum Entropy (BME method), which is an emerging method of the geological spatiotemporal statistics field. The BME method can identify the spatiotemporal correlation of the data, and combine not only the hard data but the soft data to improve estimation. The data of lithological classification is discrete categorical data. Therefore, this research applied Categorical BME to establish a complete three-dimensional Lithological estimation model. Apply the limited hard data from the cores and the soft data generated from the geological dating data and the virtual wells to estimate the three-dimensional lithological classification in Taipei Basin. Keywords: Categorical Bayesian Maximum Entropy method, Lithological Classification, Hydrogeological Setting
Beyond Zar: the use and abuse of classification statistics for otolith chemistry.
Jones, C M; Palmer, M; Schaffler, J J
2017-02-01
Classification method performance was evaluated using otolith chemistry of juvenile Atlantic menhaden Brevoortia tyrannus when assumptions of data normality were met and were violated. Four methods were tested [linear discriminant function analysis (LDFA), quadratic discriminant function analysis (QDFA), random forest (RF) and artificial neural networks (ANN)] using computer simulation to determine their performance when variable-group means ranged from small to large and their performance under conditions of typical skewness to double the amount of skewness typically observed. Using the kappa index, the parametric methods performed best after applying appropriate data transformation, gaining 2% better performance with LDFA performing slightly better than QDFA. RF performed as well as QDFA and showed no difference in performance between raw and transformed data while the performance of ANN was the poorest and worse with raw data. All methods performed well when group differences were large, but parametric methods outperformed machine-learning methods. When data were skewed the performance of all methods declined and worsened with greater skewness, but RF performed consistently as well or better than the other methods in the presence of skewness. The parametric methods were found to be more powerful when assumptions of normality can be met and can be used confidently when skewness and kurtosis are minimized. When these assumptions cannot be minimized, then machine-algorithm methods should also be tried. © 2016 The Fisheries Society of the British Isles.
Analysis and application of classification methods of complex carbonate reservoirs
Li, Xiongyan; Qin, Ruibao; Ping, Haitao; Wei, Dan; Liu, Xiaomei
2018-06-01
There are abundant carbonate reservoirs from the Cenozoic to Mesozoic era in the Middle East. Due to variation in sedimentary environment and diagenetic process of carbonate reservoirs, several porosity types coexist in carbonate reservoirs. As a result, because of the complex lithologies and pore types as well as the impact of microfractures, the pore structure is very complicated. Therefore, it is difficult to accurately calculate the reservoir parameters. In order to accurately evaluate carbonate reservoirs, based on the pore structure evaluation of carbonate reservoirs, the classification methods of carbonate reservoirs are analyzed based on capillary pressure curves and flow units. Based on the capillary pressure curves, although the carbonate reservoirs can be classified, the relationship between porosity and permeability after classification is not ideal. On the basis of the flow units, the high-precision functional relationship between porosity and permeability after classification can be established. Therefore, the carbonate reservoirs can be quantitatively evaluated based on the classification of flow units. In the dolomite reservoirs, the average absolute error of calculated permeability decreases from 15.13 to 7.44 mD. Similarly, the average absolute error of calculated permeability of limestone reservoirs is reduced from 20.33 to 7.37 mD. Only by accurately characterizing pore structures and classifying reservoir types, reservoir parameters could be calculated accurately. Therefore, characterizing pore structures and classifying reservoir types are very important to accurate evaluation of complex carbonate reservoirs in the Middle East.
Classification of Children Intelligence with Fuzzy Logic Method
Syahminan; ika Hidayati, Permata
2018-04-01
Intelligence of children s An Important Thing To Know The Parents Early on. Typing Can be done With a Child’s intelligence Grouping Dominant Characteristics Of each Type of Intelligence. To Make it easier for Parents in Determining The type of Children’s intelligence And How to Overcome them, for It Created A Classification System Intelligence Grouping Children By Using Fuzzy logic method For determination Of a Child’s degree of intelligence type. From the analysis We concluded that The presence of Intelligence Classification systems Pendulum Children With Fuzzy Logic Method Of determining The type of The Child’s intelligence Can be Done in a way That is easier And The results More accurate Conclusions Than Manual tests.
Effects of Feature Extraction and Classification Methods on Cyberbully Detection
ÖZEL, Selma Ayşe; SARAÇ, Esra
2016-01-01
Cyberbullying is defined as an aggressive, intentional action against a defenseless person by using the Internet, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended in suicides; hence automatic detection of cyberbullying has become important. In this study we show the effects of feature extraction, feature selection, and classification methods that are used, on the performance of automatic detection of cyberbullying. To perform the exper...
Classification Method in Integrated Information Network Using Vector Image Comparison
Directory of Open Access Journals (Sweden)
Zhou Yuan
2014-05-01
Full Text Available Wireless Integrated Information Network (WMN consists of integrated information that can get data from its surrounding, such as image, voice. To transmit information, large resource is required which decreases the service time of the network. In this paper we present a Classification Approach based on Vector Image Comparison (VIC for WMN that improve the service time of the network. The available methods for sub-region selection and conversion are also proposed.
Classification of analysis methods for characterization of magnetic nanoparticle properties
DEFF Research Database (Denmark)
Posth, O.; Hansen, Mikkel Fougt; Steinhoff, U.
2015-01-01
The aim of this paper is to provide a roadmap for the standardization of magnetic nanoparticle (MNP) characterization. We have assessed common MNP analysis techniques under various criteria in order to define the methods that can be used as either standard techniques for magnetic particle...... characterization or those that can be used to obtain a comprehensive picture of a MNP system. This classification is the first step on the way to develop standards for nanoparticle characterization....
Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach
Directory of Open Access Journals (Sweden)
Alessandro Bugatti
2002-04-01
Full Text Available We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron. In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.
Leydesdorff, L.; Kogler, D.F.; Yan, B.
The Cooperative Patent Classifications (CPC) recently developed cooperatively by the European and US Patent Offices provide a new basis for mapping patents and portfolio analysis. CPC replaces International Patent Classifications (IPC) of the World Intellectual Property Organization. In this study,
Statistical methods and their applications in constructional engineering
International Nuclear Information System (INIS)
1977-01-01
An introduction into the basic terms of statistics is followed by a discussion of elements of the probability theory, customary discrete and continuous distributions, simulation methods, statistical supporting framework dynamics, and a cost-benefit analysis of the methods introduced. (RW) [de
Online Statistics Labs in MSW Research Methods Courses: Reducing Reluctance toward Statistics
Elliott, William; Choi, Eunhee; Friedline, Terri
2013-01-01
This article presents results from an evaluation of an online statistics lab as part of a foundations research methods course for master's-level social work students. The article discusses factors that contribute to an environment in social work that fosters attitudes of reluctance toward learning and teaching statistics in research methods…
Application of texture analysis method for mammogram density classification
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
A hierarchical inferential method for indoor scene classification
Directory of Open Access Journals (Sweden)
Jiang Jingzhe
2017-12-01
Full Text Available Indoor scene classification forms a basis for scene interaction for service robots. The task is challenging because the layout and decoration of a scene vary considerably. Previous studies on knowledge-based methods commonly ignore the importance of visual attributes when constructing the knowledge base. These shortcomings restrict the performance of classification. The structure of a semantic hierarchy was proposed to describe similarities of different parts of scenes in a fine-grained way. Besides the commonly used semantic features, visual attributes were also introduced to construct the knowledge base. Inspired by the processes of human cognition and the characteristics of indoor scenes, we proposed an inferential framework based on the Markov logic network. The framework is evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
Application of nonparametric statistic method for DNBR limit calculation
International Nuclear Information System (INIS)
Dong Bo; Kuang Bo; Zhu Xuenong
2013-01-01
Background: Nonparametric statistical method is a kind of statistical inference method not depending on a certain distribution; it calculates the tolerance limits under certain probability level and confidence through sampling methods. The DNBR margin is one important parameter of NPP design, which presents the safety level of NPP. Purpose and Methods: This paper uses nonparametric statistical method basing on Wilks formula and VIPER-01 subchannel analysis code to calculate the DNBR design limits (DL) of 300 MW NPP (Nuclear Power Plant) during the complete loss of flow accident, simultaneously compared with the DL of DNBR through means of ITDP to get certain DNBR margin. Results: The results indicate that this method can gain 2.96% DNBR margin more than that obtained by ITDP methodology. Conclusions: Because of the reduction of the conservation during analysis process, the nonparametric statistical method can provide greater DNBR margin and the increase of DNBR margin is benefited for the upgrading of core refuel scheme. (authors)
A study of several CAD methods for classification of clustered microcalcifications
Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei
2005-04-01
In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.
Clary, Renee; Wandersee, James
2013-01-01
In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…
[Galaxy/quasar classification based on nearest neighbor method].
Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun
2011-09-01
With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.
Berti, Matteo; Corsini, Alessandro; Franceschini, Silvia; Iannacone, Jean Pascal
2013-04-01
The application of space borne synthetic aperture radar interferometry has progressed, over the last two decades, from the pioneer use of single interferograms for analyzing changes on the earth's surface to the development of advanced multi-interferogram techniques to analyze any sort of natural phenomena which involves movements of the ground. The success of multi-interferograms techniques in the analysis of natural hazards such as landslides and subsidence is widely documented in the scientific literature and demonstrated by the consensus among the end-users. Despite the great potential of this technique, radar interpretation of slope movements is generally based on the sole analysis of average displacement velocities, while the information embraced in multi interferogram time series is often overlooked if not completely neglected. The underuse of PS time series is probably due to the detrimental effect of residual atmospheric errors, which make the PS time series characterized by erratic, irregular fluctuations often difficult to interpret, and also to the difficulty of performing a visual, supervised analysis of the time series for a large dataset. In this work is we present a procedure for automatic classification of PS time series based on a series of statistical characterization tests. The procedure allows to classify the time series into six distinctive target trends (0=uncorrelated; 1=linear; 2=quadratic; 3=bilinear; 4=discontinuous without constant velocity; 5=discontinuous with change in velocity) and retrieve for each trend a series of descriptive parameters which can be efficiently used to characterize the temporal changes of ground motion. The classification algorithms were developed and tested using an ENVISAT datasets available in the frame of EPRS-E project (Extraordinary Plan of Environmental Remote Sensing) of the Italian Ministry of Environment (track "Modena", Northern Apennines). This dataset was generated using standard processing, then the
A NEW CLASSIFICATION METHOD FOR GAMMA-RAY BURSTS
International Nuclear Information System (INIS)
Lue Houjun; Liang Enwei; Zhang Binbin; Zhang Bing
2010-01-01
Recent Swift observations suggest that the traditional long versus short gamma-ray burst (GRB) classification scheme does not always associate GRBs to the two physically motivated model types, i.e., Type II (massive star origin) versus Type I (compact star origin). We propose a new phenomenological classification method of GRBs by introducing a new parameter ε = E γ,iso,52 /E 5/3 p,z,2 , where E γ,iso is the isotropic gamma-ray energy (in units of 10 52 erg) and E p,z is the cosmic rest-frame spectral peak energy (in units of 100 keV). For those short GRBs with 'extended emission', both quantities are defined for the short/hard spike only. With the current complete sample of GRBs with redshift and E p measurements, the ε parameter shows a clear bimodal distribution with a separation at ε ∼ 0.03. The high-ε region encloses the typical long GRBs with high luminosity, some high-z 'rest-frame-short' GRBs (such as GRB 090423 and GRB 080913), as well as some high-z short GRBs (such as GRB 090426). All these GRBs have been claimed to be of Type II origin based on other observational properties in the literature. All the GRBs that are argued to be of Type I origin are found to be clustered in the low-ε region. They can be separated from some nearby low-luminosity long GRBs (in 3σ) by an additional T 90 criterion, i.e., T 90,z ∼< 5 s in the Swift/BAT band. We suggest that this new classification scheme can better match the physically motivated Type II/I classification scheme.
DEFF Research Database (Denmark)
Hjørland, Birger
2017-01-01
This article presents and discusses definitions of the term “classification” and the related concepts “Concept/conceptualization,”“categorization,” “ordering,” “taxonomy” and “typology.” It further presents and discusses theories of classification including the influences of Aristotle...... and Wittgenstein. It presents different views on forming classes, including logical division, numerical taxonomy, historical classification, hermeneutical and pragmatic/critical views. Finally, issues related to artificial versus natural classification and taxonomic monism versus taxonomic pluralism are briefly...
Evaluation of normalization methods for cDNA microarray data by k-NN classification
Energy Technology Data Exchange (ETDEWEB)
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, Saira; Bissell, Mina J
2004-12-17
-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.
The estimation of the measurement results with using statistical methods
International Nuclear Information System (INIS)
Ukrmetrteststandard, 4, Metrologichna Str., 03680, Kyiv (Ukraine))" data-affiliation=" (State Enterprise Ukrmetrteststandard, 4, Metrologichna Str., 03680, Kyiv (Ukraine))" >Velychko, O; UkrNDIspirtbioprod, 3, Babushkina Lane, 03190, Kyiv (Ukraine))" data-affiliation=" (State Scientific Institution UkrNDIspirtbioprod, 3, Babushkina Lane, 03190, Kyiv (Ukraine))" >Gordiyenko, T
2015-01-01
The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed
The estimation of the measurement results with using statistical methods
Velychko, O.; Gordiyenko, T.
2015-02-01
The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.
Fault classification method for the driving safety of electrified vehicles
Wanner, Daniel; Drugge, Lars; Stensson Trigell, Annika
2014-05-01
A fault classification method is proposed which has been applied to an electric vehicle. Potential faults in the different subsystems that can affect the vehicle directional stability were collected in a failure mode and effect analysis. Similar driveline faults were grouped together if they resembled each other with respect to their influence on the vehicle dynamic behaviour. The faults were physically modelled in a simulation environment before they were induced in a detailed vehicle model under normal driving conditions. A special focus was placed on faults in the driveline of electric vehicles employing in-wheel motors of the permanent magnet type. Several failures caused by mechanical and other faults were analysed as well. The fault classification method consists of a controllability ranking developed according to the functional safety standard ISO 26262. The controllability of a fault was determined with three parameters covering the influence of the longitudinal, lateral and yaw motion of the vehicle. The simulation results were analysed and the faults were classified according to their controllability using the proposed method. It was shown that the controllability decreased specifically with increasing lateral acceleration and increasing speed. The results for the electric driveline faults show that this trend cannot be generalised for all the faults, as the controllability deteriorated for some faults during manoeuvres with low lateral acceleration and low speed. The proposed method is generic and can be applied to various other types of road vehicles and faults.
Statistical methods for accurately determining criticality code bias
International Nuclear Information System (INIS)
Trumble, E.F.; Kimball, K.D.
1997-01-01
A system of statistically treating validation calculations for the purpose of determining computer code bias is provided in this paper. The following statistical treatments are described: weighted regression analysis, lower tolerance limit, lower tolerance band, and lower confidence band. These methods meet the criticality code validation requirements of ANS 8.1. 8 refs., 5 figs., 4 tabs
BASIC METHODS OF CLASSIFICATION AND CHARACTERISTICS OF METHODS OF PRICING IN UKRAINE
A. Boguslavskiy
2014-01-01
The article provided definitions and shows the need to use different methods of pricing of enterprises. Exposed the reasons of the absence of a universal classification of pricing methods. The approaches of different authors to classify groups of pricing methods: 1) the cost method; 2) methods with a focus on competition; 3) methods for pricing based on demand, 4) pricing with a focus on maximum profit, 5) parametric methods, 6) pricing under risk and uncertainty, etc. An improved classificat...
A method to incorporate uncertainty in the classification of remote sensing images
Gonçalves, Luísa M. S.; Fonte, Cidália C.; Júlio, Eduardo N. B. S.; Caetano, Mario
2009-01-01
The aim of this paper is to investigate if the incorporation of the uncertainty associated with the classification of surface elements into the classification of landscape units (LUs) increases the results accuracy. To this end, a hybrid classification method is developed, including uncertainty information in the classification of very high spatial resolution multi-spectral satellite images, to obtain a map of LUs. The developed classification methodology includes the following...
Stumpe, B; Engel, T; Steinweg, B; Marschner, B
2012-04-03
In the past, different slag materials were often used for landscaping and construction purposes or simply dumped. Nowadays German environmental laws strictly control the use of slags, but there is still a remaining part of 35% which is uncontrolled dumped in landfills. Since some slags have high heavy metal contents and different slag types have typical chemical and physical properties that will influence the risk potential and other characteristics of the deposits, an identification of the slag types is needed. We developed a FT-IR-based statistical method to identify different slags classes. Slags samples were collected at different sites throughout various cities within the industrial Ruhr area. Then, spectra of 35 samples from four different slags classes, ladle furnace (LF), blast furnace (BF), oxygen furnace steel (OF), and zinc furnace slags (ZF), were determined in the mid-infrared region (4000-400 cm(-1)). The spectra data sets were subject to statistical classification methods for the separation of separate spectral data of different slag classes. Principal component analysis (PCA) models for each slag class were developed and further used for soft independent modeling of class analogy (SIMCA). Precise classification of slag samples into four different slag classes were achieved using two different SIMCA models stepwise. At first, SIMCA 1 was used for classification of ZF as well as OF slags over the total spectral range. If no correct classification was found, then the spectrum was analyzed with SIMCA 2 at reduced wavenumbers for the classification of LF as well as BF spectra. As a result, we provide a time- and cost-efficient method based on FT-IR spectroscopy for processing and identifying large numbers of environmental slag samples.
Longitudinal data analysis a handbook of modern statistical methods
Fitzmaurice, Garrett; Verbeke, Geert; Molenberghs, Geert
2008-01-01
Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted challenges that arise in analyzing longitudinal data. After discussing historical aspects, leading researchers explore four broad themes: parametric modeling, nonparametric and semiparametric methods, joint
Statistical methods for evaluating the attainment of cleanup standards
Energy Technology Data Exchange (ETDEWEB)
Gilbert, R.O.; Simpson, J.C.
1992-12-01
This document is the third volume in a series of volumes sponsored by the US Environmental Protection Agency (EPA), Statistical Policy Branch, that provide statistical methods for evaluating the attainment of cleanup Standards at Superfund sites. Volume 1 (USEPA 1989a) provides sampling designs and tests for evaluating attainment of risk-based standards for soils and solid media. Volume 2 (USEPA 1992) provides designs and tests for evaluating attainment of risk-based standards for groundwater. The purpose of this third volume is to provide statistical procedures for designing sampling programs and conducting statistical tests to determine whether pollution parameters in remediated soils and solid media at Superfund sites attain site-specific reference-based standards. This.document is written for individuals who may not have extensive training or experience with statistical methods. The intended audience includes EPA regional remedial project managers, Superfund-site potentially responsible parties, state environmental protection agencies, and contractors for these groups.
Comparisons of likelihood and machine learning methods of individual classification
Guinand, B.; Topchy, A.; Page, K.S.; Burnham-Curtis, M. K.; Punch, W.F.; Scribner, K.T.
2002-01-01
Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin (“assignment tests”). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high FST), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0–2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to “learn” and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks. In recent years, characterization of highly polymorphic molecular markers such as mini- and microsatellites and development of novel methods of analysis have enabled researchers to extend investigations of ecological and evolutionary processes below the population level to the level of
Complex Data Modeling and Computationally Intensive Statistical Methods
Mantovan, Pietro
2010-01-01
The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
Kawata, Masaaki; Sato, Chikara
2007-06-01
In determining the three-dimensional (3D) structure of macromolecular assemblies in single particle analysis, a large representative dataset of two-dimensional (2D) average images from huge number of raw images is a key for high resolution. Because alignments prior to averaging are computationally intensive, currently available multireference alignment (MRA) software does not survey every possible alignment. This leads to misaligned images, creating blurred averages and reducing the quality of the final 3D reconstruction. We present a new method, in which multireference alignment is harmonized with classification (multireference multiple alignment: MRMA). This method enables a statistical comparison of multiple alignment peaks, reflecting the similarities between each raw image and a set of reference images. Among the selected alignment candidates for each raw image, misaligned images are statistically excluded, based on the principle that aligned raw images of similar projections have a dense distribution around the correctly aligned coordinates in image space. This newly developed method was examined for accuracy and speed using model image sets with various signal-to-noise ratios, and with electron microscope images of the Transient Receptor Potential C3 and the sodium channel. In every data set, the newly developed method outperformed conventional methods in robustness against noise and in speed, creating 2D average images of higher quality. This statistically harmonized alignment-classification combination should greatly improve the quality of single particle analysis.
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech
Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng
It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
Analysis of Statistical Methods Currently used in Toxicology Journals.
Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min
2014-09-01
Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.
Classification of methods and equipment recovery secondary waters
Directory of Open Access Journals (Sweden)
G. V. Kalashnikov
2017-01-01
Full Text Available The issues of purification of secondary waters of industrial production have an important place and are relevant in the environmental activities of all food and chemical industries. For cleaning the transporter-washing water of beet-sugar production the key role is played by the equipment of treatment plants. A wide variety of wastewater treatment equipment is classified according to various methods. Typical structures used are sedimentation tanks, hydrocyclones, separators, centrifuges. In turn, they have a different degree of purification, productivity through the incoming suspension and purified secondary water. This is equipment is divided into designs, depending on the range of particles to be removed. A general classification of methods for cleaning the transporter-washing water, as well as the corresponding equipment, is made. Based on the analysis of processes and instrumentation, the main methods of wastewater treatment are identified: mechanical, physicochemical, combined, biological and disinfection. To increase the degree of purification and reduce technical and economic costs, a combined method is widely used. The main task of the site for cleaning the transporter-washing waters of sugar beet production is to provide the enterprise with water in the required quantity and quality, with economical use of water resources, taking into account the absence of pollution of surface and groundwater by industrial wastewater. In the sugar industry is currently new types of washing equipment of foreign production are widely used, which require high quality and a large amount of purified transporter-washing water for normal operation. The proposed classification makes it possible to carry out a comparative technical and economic analysis when choosing the methods and equipment for recuperation of secondary waters. The main equipment secondary water recovery used at the beet-sugar plant is considered. The most common beet processing plant is a
International Nuclear Information System (INIS)
Sabaton, M.; Viollet, P.L.; Darles, A.; Gland, H.
1980-07-01
The PANACH three dimensional calculation code developed from tests on a small scale model and validated from full scale measurement campaigns, was used to estimate a three dimensional statistic of plumes. As it is not possible with the calculation times to make a calculation for each radio sondage, a classification method was adopted. This method developed by the French National Meteorological Office is based on a double classification comprising basic classes in which the plumes are assumed to be dynamically similar and a sub-classification to take better account of the true moisture profiles. This statistical method was then applied to the case of 2 or 4 1300 MWe units fitted with natural draught cooling towers of the wet, dry or wet-dry types [fr
Using discriminant analysis as a nucleation event classification method
Directory of Open Access Journals (Sweden)
S. Mikkonen
2006-01-01
Full Text Available More than three years of measurements of aerosol size-distribution and different gas and meteorological parameters made in Po Valley, Italy were analysed for this study to examine which of the meteorological and trace gas variables effect on the emergence of nucleation events. As the analysis method, we used discriminant analysis with non-parametric Epanechnikov kernel, included in non-parametric density estimation method. The best classification result in our data was reached with the combination of relative humidity, ozone concentration and a third degree polynomial of radiation. RH appeared to have a preventing effect on the new particle formation whereas the effects of O3 and radiation were more conductive. The concentration of SO2 and NO2 also appeared to have significant effect on the emergence of nucleation events but because of the great amount of missing observations, we had to exclude them from the final analysis.
Decision tree methods: applications for classification and prediction.
Song, Yan-Yan; Lu, Ying
2015-04-25
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
A novel method for human age group classification based on
Directory of Open Access Journals (Sweden)
Anuradha Yarlagadda
2015-10-01
Full Text Available In the computer vision community, easy categorization of a person’s facial image into various age groups is often quite precise and is not pursued effectively. To address this problem, which is an important area of research, the present paper proposes an innovative method of age group classification system based on the Correlation Fractal Dimension of complex facial image. Wrinkles appear on the face with aging thereby changing the facial edges of the image. The proposed method is rotation and poses invariant. The present paper concentrates on developing an innovative technique that classifies facial images into four categories i.e. child image (0–15, young adult image (15–30, middle-aged adult image (31–50, and senior adult image (>50 based on correlation FD value of a facial edge image.
Data preprocessing methods of FT-NIR spectral data for the classification cooking oil
Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli
2014-12-01
This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.
Instrumental and statistical methods for the comparison of class evidence
Liszewski, Elisa Anne
Trace evidence is a major field within forensic science. Association of trace evidence samples can be problematic due to sample heterogeneity and a lack of quantitative criteria for comparing spectra or chromatograms. The aim of this study is to evaluate different types of instrumentation for their ability to discriminate among samples of various types of trace evidence. Chemometric analysis, including techniques such as Agglomerative Hierarchical Clustering, Principal Components Analysis, and Discriminant Analysis, was employed to evaluate instrumental data. First, automotive clear coats were analyzed by using microspectrophotometry to collect UV absorption data. In total, 71 samples were analyzed with classification accuracy of 91.61%. An external validation was performed, resulting in a prediction accuracy of 81.11%. Next, fiber dyes were analyzed using UV-Visible microspectrophotometry. While several physical characteristics of cotton fiber can be identified and compared, fiber color is considered to be an excellent source of variation, and thus was examined in this study. Twelve dyes were employed, some being visually indistinguishable. Several different analyses and comparisons were done, including an inter-laboratory comparison and external validations. Lastly, common plastic samples and other polymers were analyzed using pyrolysis-gas chromatography/mass spectrometry, and their pyrolysis products were then analyzed using multivariate statistics. The classification accuracy varied dependent upon the number of classes chosen, but the plastics were grouped based on composition. The polymers were used as an external validation and misclassifications occurred with chlorinated samples all being placed into the category containing PVC.
Fusion of fuzzy statistical distributions for classification of thyroid ultrasound patterns.
Iakovidis, Dimitris K; Keramidas, Eystratios G; Maroulis, Dimitris
2010-09-01
This paper proposes a novel approach for thyroid ultrasound pattern representation. Considering that texture and echogenicity are correlated with thyroid malignancy, the proposed approach encodes these sonographic features via a noise-resistant representation. This representation is suitable for the discrimination of nodules of high malignancy risk from normal thyroid parenchyma. The material used in this study includes a total of 250 thyroid ultrasound patterns obtained from 75 patients in Greece. The patterns are represented by fused vectors of fuzzy features. Ultrasound texture is represented by fuzzy local binary patterns, whereas echogenicity is represented by fuzzy intensity histograms. The encoded thyroid ultrasound patterns are discriminated by support vector classifiers. The proposed approach was comprehensively evaluated using receiver operating characteristics (ROCs). The results show that the proposed fusion scheme outperforms previous thyroid ultrasound pattern representation methods proposed in the literature. The best classification accuracy was obtained with a polynomial kernel support vector machine, and reached 97.5% as estimated by the area under the ROC curve. The fusion of fuzzy local binary patterns and fuzzy grey-level histogram features is more effective than the state of the art approaches for the representation of thyroid ultrasound patterns and can be effectively utilized for the detection of nodules of high malignancy risk in the context of an intelligent medical system. Copyright (c) 2010 Elsevier B.V. All rights reserved.
She Son Gun
2014-01-01
Approaches to development of classification of the state methods of regulation of economy are considered. On the basis of the provided review the complex method of state regulation of business activity is reasonable. The offered principles allow improving public administration and can be used in industry concepts and state programs on support of small business in fishery.
Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods
Directory of Open Access Journals (Sweden)
Saadia Zahid
2015-01-01
Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.
Brief guidelines for methods and statistics in medical research
Ab Rahman, Jamalludin
2015-01-01
This book serves as a practical guide to methods and statistics in medical research. It includes step-by-step instructions on using SPSS software for statistical analysis, as well as relevant examples to help those readers who are new to research in health and medical fields. Simple texts and diagrams are provided to help explain the concepts covered, and print screens for the statistical steps and the SPSS outputs are provided, together with interpretations and examples of how to report on findings. Brief Guidelines for Methods and Statistics in Medical Research offers a valuable quick reference guide for healthcare students and practitioners conducting research in health related fields, written in an accessible style.
Fundamentals of modern statistical methods substantially improving power and accuracy
Wilcox, Rand R
2001-01-01
Conventional statistical methods have a very serious flaw They routinely miss differences among groups or associations among variables that are detected by more modern techniques - even under very small departures from normality Hundreds of journal articles have described the reasons standard techniques can be unsatisfactory, but simple, intuitive explanations are generally unavailable Improved methods have been derived, but they are far from obvious or intuitive based on the training most researchers receive Situations arise where even highly nonsignificant results become significant when analyzed with more modern methods Without assuming any prior training in statistics, Part I of this book describes basic statistical principles from a point of view that makes their shortcomings intuitive and easy to understand The emphasis is on verbal and graphical descriptions of concepts Part II describes modern methods that address the problems covered in Part I Using data from actual studies, many examples are include...
Directory of Open Access Journals (Sweden)
Nadja Stumberg
2014-05-01
Full Text Available The vegetation in the forest-tundra ecotone zone is expected to be highly affected by climate change and requires effective monitoring techniques. Airborne laser scanning (ALS has been proposed as a tool for the detection of small pioneer trees for such vast areas using laser height and intensity data. The main objective of the present study was to assess a possible improvement in the performance of classifying tree and nontree laser echoes from high-density ALS data. The data were collected along a 1000 km long transect stretching from southern to northern Norway. Different geostatistical and statistical measures derived from laser height and intensity values were used to extent and potentially improve more simple models ignoring the spatial context. Generalised linear models (GLM and support vector machines (SVM were employed as classification methods. Total accuracies and Cohen’s kappa coefficients were calculated and compared to those of simpler models from a previous study. For both classification methods, all models revealed total accuracies similar to the results of the simpler models. Concerning classification performance, however, the comparison of the kappa coefficients indicated a significant improvement for some models both using GLM and SVM, with classification accuracies >94%.
Quantitative EEG Applying the Statistical Recognition Pattern Method
DEFF Research Database (Denmark)
Engedal, Knut; Snaedal, Jon; Hoegh, Peter
2015-01-01
BACKGROUND/AIM: The aim of this study was to examine the discriminatory power of quantitative EEG (qEEG) applying the statistical pattern recognition (SPR) method to separate Alzheimer's disease (AD) patients from elderly individuals without dementia and from other dementia patients. METHODS...
An Overview of Short-term Statistical Forecasting Methods
DEFF Research Database (Denmark)
Elias, Russell J.; Montgomery, Douglas C.; Kulahci, Murat
2006-01-01
An overview of statistical forecasting methodology is given, focusing on techniques appropriate to short- and medium-term forecasts. Topics include basic definitions and terminology, smoothing methods, ARIMA models, regression methods, dynamic regression models, and transfer functions. Techniques...... for evaluating and monitoring forecast performance are also summarized....
Hierarchical modelling for the environmental sciences statistical methods and applications
Clark, James S
2006-01-01
New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.
Glushak, P. A.; Markiv, B. B.; Tokarchuk, M. V.
2018-01-01
We present a generalization of Zubarev's nonequilibrium statistical operator method based on the principle of maximum Renyi entropy. In the framework of this approach, we obtain transport equations for the basic set of parameters of the reduced description of nonequilibrium processes in a classical system of interacting particles using Liouville equations with fractional derivatives. For a classical systems of particles in a medium with a fractal structure, we obtain a non-Markovian diffusion equation with fractional spatial derivatives. For a concrete model of the frequency dependence of a memory function, we obtain generalized Kettano-type diffusion equation with the spatial and temporal fractality taken into account. We present a generalization of nonequilibrium thermofield dynamics in Zubarev's nonequilibrium statistical operator method in the framework of Renyi statistics.
Descriptive and inferential statistical methods used in burns research.
Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars
2010-05-01
Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals
Schildkrout, Barbara
2016-10-01
A new nosology for mental disorders is needed as a basis for effective scientific inquiry. Diagnostic and Statistical Manual of Mental Disorders and International Classification of Diseases diagnoses are not natural, biological categories, and these diagnostic systems do not address mental phenomena that exist on a spectrum. Advances in neuroscience offer the hope of breakthroughs for diagnosing and treating major mental illness in the future. At present, a neuroscience-based understanding of brain/behavior relationships can reshape clinical thinking. Neuroscience literacy allows psychiatrists to formulate biologically informed psychological theories, to follow neuroscientific literature pertinent to psychiatry, and to embark on a path toward neurologically informed clinical thinking that can help move the field away from Diagnostic and Statistical Manual of Mental Disorders and International Classification of Diseases conceptualizations. Psychiatrists are urged to work toward attaining neuroscience literacy to prepare for and contribute to the development of a new nosology.
Statistical-mechanical entropy by the thin-layer method
International Nuclear Information System (INIS)
Feng, He; Kim, Sung Won
2003-01-01
G. Hooft first studied the statistical-mechanical entropy of a scalar field in a Schwarzschild black hole background by the brick-wall method and hinted that the statistical-mechanical entropy is the statistical origin of the Bekenstein-Hawking entropy of the black hole. However, according to our viewpoint, the statistical-mechanical entropy is only a quantum correction to the Bekenstein-Hawking entropy of the black-hole. The brick-wall method based on thermal equilibrium at a large scale cannot be applied to the cases out of equilibrium such as a nonstationary black hole. The statistical-mechanical entropy of a scalar field in a nonstationary black hole background is calculated by the thin-layer method. The condition of local equilibrium near the horizon of the black hole is used as a working postulate and is maintained for a black hole which evaporates slowly enough and whose mass is far greater than the Planck mass. The statistical-mechanical entropy is also proportional to the area of the black hole horizon. The difference from the stationary black hole is that the result relies on a time-dependent cutoff
The Methods of Stress Management and Their Classification
Directory of Open Access Journals (Sweden)
Honchar Mykhailo F.
2017-12-01
Full Text Available The article considers the content and classification of methods of stress management, which provides systematization of their varieties by the number of existing (character, time interval of application, direction of impact, period of action, way of account the interests of employees, level of formation, method of substantiation, content and the allocated new attributes (scale of changes in terms of stress management systems, level of novelty at enterprise, consistency, which allows to choose the appropriate types of such methods in overcoming undesirable deviations that have a significant negative impact on the functioning of economic entities. It has been determined that such methods are formed in the implementing of technology of stress-management; are the result of management activities of the steering subsystem of organization at each level of management; have alternative nature; form an information-management base for the adoption of managerial decisions in terms of the systems of stress administration. It has been specified that, with the assistance of certain methods in terms of stress management systems, managers can track existing and potential problems in the complex and dynamic environment of the organization, identify their relationships, identify «weak signals», adjust goals and tasks of management of critical undesirable deviations, determine indicators and criteria of stress-management, etc.
Academic Training Lecture: Statistical Methods for Particle Physics
PH Department
2012-01-01
2, 3, 4 and 5 April 2012 Academic Training Lecture Regular Programme from 11:00 to 12:00 - Bldg. 222-R-001 - Filtration Plant Statistical Methods for Particle Physics by Glen Cowan (Royal Holloway) The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Methods library of embedded R functions at Statistics Norway
Directory of Open Access Journals (Sweden)
Øyvind Langsrud
2017-11-01
Full Text Available Statistics Norway is modernising the production processes. An important element in this work is a library of functions for statistical computations. In principle, the functions in such a methods library can be programmed in several languages. A modernised production environment demand that these functions can be reused for different statistics products, and that they are embedded within a common IT system. The embedding should be done in such a way that the users of the methods do not need to know the underlying programming language. As a proof of concept, Statistics Norway soon has established a methods library offering a limited number of methods for macro-editing, imputation and confidentiality. This is done within an area of municipal statistics with R as the only programming language. This paper presents the details and experiences from this work. The problem of fitting real word applications to simple and strict standards is discussed and exemplified by the development of solutions to regression imputation and table suppression.
Application of blended learning in teaching statistical methods
Directory of Open Access Journals (Sweden)
Barbara Dębska
2012-12-01
Full Text Available The paper presents the application of a hybrid method (blended learning - linking traditional education with on-line education to teach selected problems of mathematical statistics. This includes the teaching of the application of mathematical statistics to evaluate laboratory experimental results. An on-line statistics course was developed to form an integral part of the module ‘methods of statistical evaluation of experimental results’. The course complies with the principles outlined in the Polish National Framework of Qualifications with respect to the scope of knowledge, skills and competencies that students should have acquired at course completion. The paper presents the structure of the course and the educational content provided through multimedia lessons made accessible on the Moodle platform. Following courses which used the traditional method of teaching and courses which used the hybrid method of teaching, students test results were compared and discussed to evaluate the effectiveness of the hybrid method of teaching when compared to the effectiveness of the traditional method of teaching.
Statistical Methods for Particle Physics (4/4)
CERN. Geneva
2012-01-01
The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Statistical Methods for Particle Physics (1/4)
CERN. Geneva
2012-01-01
The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Statistical Methods for Particle Physics (2/4)
CERN. Geneva
2012-01-01
The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Statistical Methods for Particle Physics (3/4)
CERN. Geneva
2012-01-01
The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Image segmentation and particles classification using texture analysis method
Directory of Open Access Journals (Sweden)
Mayar Aly Atteya
Full Text Available Introduction: Ingredients of oily fish include a large amount of polyunsaturated fatty acids, which are important elements in various metabolic processes of humans, and have also been used to prevent diseases. However, in an attempt to reduce cost, recent developments are starting a replace the ingredients of fish oil with products of microalgae, that also produce polyunsaturated fatty acids. To do so, it is important to closely monitor morphological changes in algae cells and monitor their age in order to achieve the best results. This paper aims to describe an advanced vision-based system to automatically detect, classify, and track the organic cells using a recently developed SOPAT-System (Smart On-line Particle Analysis Technology, a photo-optical image acquisition device combined with innovative image analysis software. Methods The proposed method includes image de-noising, binarization and Enhancement, as well as object recognition, localization and classification based on the analysis of particles’ size and texture. Results The methods allowed for correctly computing cell’s size for each particle separately. By computing an area histogram for the input images (1h, 18h, and 42h, the variation could be observed showing a clear increase in cell. Conclusion The proposed method allows for algae particles to be correctly identified with accuracies up to 99% and classified correctly with accuracies up to 100%.
Microvariability in AGNs: study of different statistical methods - I. Observational analysis
Zibecchi, L.; Andruchow, I.; Cellone, S. A.; Carpintero, D. D.; Romero, G. E.; Combi, J. A.
2017-05-01
We present the results of a study of different statistical methods currently used in the literature to analyse the (micro)variability of active galactic nuclei (AGNs) from ground-based optical observations. In particular, we focus on the comparison between the results obtained by applying the so-called C and F statistics, which are based on the ratio of standard deviations and variances, respectively. The motivation for this is that the implementation of these methods leads to different and contradictory results, making the variability classification of the light curves of a certain source dependent on the statistics implemented. For this purpose, we re-analyse the results on an AGN sample observed along several sessions with the 2.15 m 'Jorge Sahade' telescope (CASLEO), San Juan, Argentina. For each AGN, we constructed the nightly differential light curves. We thus obtained a total of 78 light curves for 39 AGNs, and we then applied the statistical tests mentioned above, in order to re-classify the variability state of these light curves and in an attempt to find the suitable statistical methodology to study photometric (micro)variations. We conclude that, although the C criterion is not proper a statistical test, it could still be a suitable parameter to detect variability and that its application allows us to get more reliable variability results, in contrast with the F test.
Application of FT-IR Classification Method in Silica-Plant Extracts Composites Quality Testing
Bicu, A.; Drumea, V.; Mihaiescu, D. E.; Purcareanu, B.; Florea, M. A.; Trică, B.; Vasilievici, G.; Draga, S.; Buse, E.; Olariu, L.
2018-06-01
Our present work is concerned with the validation and quality testing efforts of mesoporous silica - plant extracts composites, in order to sustain the standardization process of plant-based pharmaceutical products. The synthesis of the silica support were performed by using a TEOS based synthetic route and CTAB as a template, at room temperature and normal pressure. The silica support was analyzed by advanced characterization methods (SEM, TEM, BET, DLS and FT-IR), and loaded with Calendula officinalis and Salvia officinalis standardized extracts. Further desorption studies were performed in order to prove the sustained release properties of the final materials. Intermediate and final product identification was performed by a FT-IR classification method, using the MID-range of the IR spectra, and statistical representative samples from repetitive synthetic stages. The obtained results recommend this analytical method as a fast and cost effective alternative to the classic identification methods.
Skinner, Carl G; Patel, Manish M; Thomas, Jerry D; Miller, Michael A
2011-01-01
Statistical methods are pervasive in medical research and general medical literature. Understanding general statistical concepts will enhance our ability to critically appraise the current literature and ultimately improve the delivery of patient care. This article intends to provide an overview of the common statistical methods relevant to medicine.
Waste classification and methods applied to specific disposal sites
International Nuclear Information System (INIS)
Rogers, V.C.
1979-01-01
An adequate definition of the classes of radioactive wastes is necessary to regulating the disposal of radioactive wastes. A classification system is proposed in which wastes are classified according to characteristics relating to their disposal. Several specific sites are analyzed with the methodology in order to gain insights into the classification of radioactive wastes. Also presented is the analysis of ocean dumping as it applies to waste classification. 5 refs
GA Based Optimal Feature Extraction Method for Functional Data Classification
Jun Wan; Zehua Chen; Yingwu Chen; Zhidong Bai
2010-01-01
Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper...
Advances in Statistical Methods for Substance Abuse Prevention Research
MacKinnon, David P.; Lockwood, Chondra M.
2010-01-01
The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467
Statistical methods of parameter estimation for deterministically chaotic time series
Pisarenko, V. F.; Sornette, D.
2004-03-01
We discuss the possibility of applying some standard statistical methods (the least-square method, the maximum likelihood method, and the method of statistical moments for estimation of parameters) to deterministically chaotic low-dimensional dynamic system (the logistic map) containing an observational noise. A “segmentation fitting” maximum likelihood (ML) method is suggested to estimate the structural parameter of the logistic map along with the initial value x1 considered as an additional unknown parameter. The segmentation fitting method, called “piece-wise” ML, is similar in spirit but simpler and has smaller bias than the “multiple shooting” previously proposed. Comparisons with different previously proposed techniques on simulated numerical examples give favorable results (at least, for the investigated combinations of sample size N and noise level). Besides, unlike some suggested techniques, our method does not require the a priori knowledge of the noise variance. We also clarify the nature of the inherent difficulties in the statistical analysis of deterministically chaotic time series and the status of previously proposed Bayesian approaches. We note the trade off between the need of using a large number of data points in the ML analysis to decrease the bias (to guarantee consistency of the estimation) and the unstable nature of dynamical trajectories with exponentially fast loss of memory of the initial condition. The method of statistical moments for the estimation of the parameter of the logistic map is discussed. This method seems to be the unique method whose consistency for deterministically chaotic time series is proved so far theoretically (not only numerically).
Statistical methods with applications to demography and life insurance
Khmaladze, Estáte V
2013-01-01
Suitable for statisticians, mathematicians, actuaries, and students interested in the problems of insurance and analysis of lifetimes, Statistical Methods with Applications to Demography and Life Insurance presents contemporary statistical techniques for analyzing life distributions and life insurance problems. It not only contains traditional material but also incorporates new problems and techniques not discussed in existing actuarial literature. The book mainly focuses on the analysis of an individual life and describes statistical methods based on empirical and related processes. Coverage ranges from analyzing the tails of distributions of lifetimes to modeling population dynamics with migrations. To help readers understand the technical points, the text covers topics such as the Stieltjes, Wiener, and Itô integrals. It also introduces other themes of interest in demography, including mixtures of distributions, analysis of longevity and extreme value theory, and the age structure of a population. In addi...
Landslide Susceptibility Statistical Methods: A Critical and Systematic Literature Review
Mihir, Monika; Malamud, Bruce; Rossi, Mauro; Reichenbach, Paola; Ardizzone, Francesca
2014-05-01
Landslide susceptibility assessment, the subject of this systematic review, is aimed at understanding the spatial probability of slope failures under a set of geomorphological and environmental conditions. It is estimated that about 375 landslides that occur globally each year are fatal, with around 4600 people killed per year. Past studies have brought out the increasing cost of landslide damages which primarily can be attributed to human occupation and increased human activities in the vulnerable environments. Many scientists, to evaluate and reduce landslide risk, have made an effort to efficiently map landslide susceptibility using different statistical methods. In this paper, we do a critical and systematic landslide susceptibility literature review, in terms of the different statistical methods used. For each of a broad set of studies reviewed we note: (i) study geography region and areal extent, (ii) landslide types, (iii) inventory type and temporal period covered, (iv) mapping technique (v) thematic variables used (vi) statistical models, (vii) assessment of model skill, (viii) uncertainty assessment methods, (ix) validation methods. We then pulled out broad trends within our review of landslide susceptibility, particularly regarding the statistical methods. We found that the most common statistical methods used in the study of landslide susceptibility include logistic regression, artificial neural network, discriminant analysis and weight of evidence. Although most of the studies we reviewed assessed the model skill, very few assessed model uncertainty. In terms of geographic extent, the largest number of landslide susceptibility zonations were in Turkey, Korea, Spain, Italy and Malaysia. However, there are also many landslides and fatalities in other localities, particularly India, China, Philippines, Nepal and Indonesia, Guatemala, and Pakistan, where there are much fewer landslide susceptibility studies available in the peer-review literature. This
The application of statistical methods to assess economic assets
Directory of Open Access Journals (Sweden)
D. V. Dianov
2017-01-01
Full Text Available The article is devoted to consideration and evaluation of machinery, equipment and special equipment, methodological aspects of the use of standards for assessment of buildings and structures in current prices, the valuation of residential, specialized houses, office premises, assessment and reassessment of existing and inactive military assets, the application of statistical methods to obtain the relevant cost estimates.The objective of the scientific article is to consider possible application of statistical tools in the valuation of the assets, composing the core group of elements of national wealth – the fixed assets. Firstly, capital tangible assets constitute the basis of material base of a new value creation, products and non-financial services. The gain, accumulated of tangible assets of a capital nature is a part of the gross domestic product, and from its volume and specific weight in the composition of GDP we can judge the scope of reproductive processes in the country.Based on the methodological materials of the state statistics bodies of the Russian Federation, regulations of the theory of statistics, which describe the methods of statistical analysis such as the index, average values, regression, the methodical approach is structured in the application of statistical tools to obtain value estimates of property, plant and equipment with significant accumulated depreciation. Until now, the use of statistical methodology in the practice of economic assessment of assets is only fragmentary. This applies to both Federal Legislation (Federal law № 135 «On valuation activities in the Russian Federation» dated 16.07.1998 in edition 05.07.2016 and the methodological documents and regulations of the estimated activities, in particular, the valuation activities’ standards. A particular problem is the use of a digital database of Rosstat (Federal State Statistics Service, as to the specific fixed assets the comparison should be carried
Experiential Approach to Teaching Statistics and Research Methods ...
African Journals Online (AJOL)
Statistics and research methods are among the more demanding topics for students of education to master at both the undergraduate and postgraduate levels. It is our conviction that teaching these topics should be combined with real practical experiences. We discuss an experiential teaching/ learning approach that ...
Application of statistical methods at copper wire manufacturing
Directory of Open Access Journals (Sweden)
Z. Hajduová
2009-01-01
Full Text Available Six Sigma is a method of management that strives for near perfection. The Six Sigma methodology uses data and rigorous statistical analysis to identify defects in a process or product, reduce variability and achieve as close to zero defects as possible. The paper presents the basic information on this methodology.
Statistical and Machine Learning forecasting methods: Concerns and ways forward.
Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios
2018-01-01
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.
Computerized statistical analysis with bootstrap method in nuclear medicine
International Nuclear Information System (INIS)
Zoccarato, O.; Sardina, M.; Zatta, G.; De Agostini, A.; Barbesti, S.; Mana, O.; Tarolo, G.L.
1988-01-01
Statistical analysis of data samples involves some hypothesis about the features of data themselves. The accuracy of these hypotheses can influence the results of statistical inference. Among the new methods of computer-aided statistical analysis, the bootstrap method appears to be one of the most powerful, thanks to its ability to reproduce many artificial samples starting from a single original sample and because it works without hypothesis about data distribution. The authors applied the bootstrap method to two typical situation of Nuclear Medicine Department. The determination of the normal range of serum ferritin, as assessed by radioimmunoassay and defined by the mean value ±2 standard deviations, starting from an experimental sample of small dimension, shows an unacceptable lower limit (ferritin plasmatic levels below zero). On the contrary, the results obtained by elaborating 5000 bootstrap samples gives ans interval of values (10.95 ng/ml - 72.87 ng/ml) corresponding to the normal ranges commonly reported. Moreover the authors applied the bootstrap method in evaluating the possible error associated with the correlation coefficient determined between left ventricular ejection fraction (LVEF) values obtained by first pass radionuclide angiocardiography with 99m Tc and 195m Au. The results obtained indicate a high degree of statistical correlation and give the range of r 2 values to be considered acceptable for this type of studies
Statistical and Machine Learning forecasting methods: Concerns and ways forward
Makridakis, Spyros; Assimakopoulos, Vassilios
2018-01-01
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784
Illinois' Forests, 2005: Statistics, Methods, and Quality Assurance
Susan J. Crocker; Charles J. Barnett; Mark A. Hatfield
2013-01-01
The first full annual inventory of Illinois' forests was completed in 2005. This report contains 1) descriptive information on methods, statistics, and quality assurance of data collection, 2) a glossary of terms, 3) tables that summarize quality assurance, and 4) a core set of tabular estimates for a variety of forest resources. A detailed analysis of inventory...
Kansas's forests, 2005: statistics, methods, and quality assurance
Patrick D. Miles; W. Keith Moser; Charles J. Barnett
2011-01-01
The first full annual inventory of Kansas's forests was completed in 2005 after 8,868 plots were selected and 468 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of Kansas inventory is presented...
South Dakota's forests, 2005: statistics, methods, and quality assurance
Patrick D. Miles; Ronald J. Piva; Charles J. Barnett
2011-01-01
The first full annual inventory of South Dakota's forests was completed in 2005 after 8,302 plots were selected and 325 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of the South Dakota...
Nebraska's forests, 2005: statistics, methods, and quality assurance
Patrick D. Miles; Dacia M. Meneguzzo; Charles J. Barnett
2011-01-01
The first full annual inventory of Nebraska's forests was completed in 2005 after 8,335 plots were selected and 274 forested plots were visited and measured. This report includes detailed information on forest inventory methods, and data quality estimates. Tables of various important resource statistics are presented. Detailed analysis of the inventory data are...
North Dakota's forests, 2005: statistics, methods, and quality assurance
Patrick D. Miles; David E. Haugen; Charles J. Barnett
2011-01-01
The first full annual inventory of North Dakota's forests was completed in 2005 after 7,622 plots were selected and 164 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of the North Dakota...
Peer-Assisted Learning in Research Methods and Statistics
Stone, Anna; Meade, Claire; Watling, Rosamond
2012-01-01
Feedback from students on a Level 1 Research Methods and Statistics module, studied as a core part of a BSc Psychology programme, highlighted demand for additional tutorials to help them to understand basic concepts. Students in their final year of study commonly request work experience to enhance their employability. All students on the Level 1…
A statistical method for 2D facial landmarking
Dibeklioğlu, H.; Salah, A.A.; Gevers, T.
2012-01-01
Many facial-analysis approaches rely on robust and accurate automatic facial landmarking to correctly function. In this paper, we describe a statistical method for automatic facial-landmark localization. Our landmarking relies on a parsimonious mixture model of Gabor wavelet features, computed in
Investigating salt frost scaling by using statistical methods
DEFF Research Database (Denmark)
Hasholt, Marianne Tange; Clemmensen, Line Katrine Harder
2010-01-01
A large data set comprising data for 118 concrete mixes on mix design, air void structure, and the outcome of freeze/thaw testing according to SS 13 72 44 has been analysed by use of statistical methods. The results show that with regard to mix composition, the most important parameter...
Characteristics and application study of AP1000 NPPs equipment reliability classification method
International Nuclear Information System (INIS)
Guan Gao
2013-01-01
AP1000 nuclear power plant applies an integrated approach to establish equipment reliability classification, which includes probabilistic risk assessment technique, maintenance rule administrative, power production reliability classification and functional equipment group bounding method, and eventually classify equipment reliability into 4 levels. This classification process and result are very different from classical RCM and streamlined RCM. It studied the characteristic of AP1000 equipment reliability classification approach, considered that equipment reliability classification should effectively support maintenance strategy development and work process control, recommended to use a combined RCM method to establish the future equipment reliability program of AP1000 nuclear power plants. (authors)
A Bayesian statistical method for particle identification in shower counters
International Nuclear Information System (INIS)
Takashimizu, N.; Kimura, A.; Shibata, A.; Sasaki, T.
2004-01-01
We report an attempt on identifying particles using a Bayesian statistical method. We have developed the mathematical model and software for this purpose. We tried to identify electrons and charged pions in shower counters using this method. We designed an ideal shower counter and studied the efficiency of identification using Monte Carlo simulation based on Geant4. Without having any other information, e.g. charges of particles which are given by tracking detectors, we have achieved 95% identifications of both particles
Quantum statistical Monte Carlo methods and applications to spin systems
International Nuclear Information System (INIS)
Suzuki, M.
1986-01-01
A short review is given concerning the quantum statistical Monte Carlo method based on the equivalence theorem that d-dimensional quantum systems are mapped onto (d+1)-dimensional classical systems. The convergence property of this approximate tansformation is discussed in detail. Some applications of this general appoach to quantum spin systems are reviewed. A new Monte Carlo method, ''thermo field Monte Carlo method,'' is presented, which is an extension of the projection Monte Carlo method at zero temperature to that at finite temperatures
Askerov, Bahram M
2010-01-01
This book deals with theoretical thermodynamics and the statistical physics of electron and particle gases. While treating the laws of thermodynamics from both classical and quantum theoretical viewpoints, it posits that the basis of the statistical theory of macroscopic properties of a system is the microcanonical distribution of isolated systems, from which all canonical distributions stem. To calculate the free energy, the Gibbs method is applied to ideal and non-ideal gases, and also to a crystalline solid. Considerable attention is paid to the Fermi-Dirac and Bose-Einstein quantum statistics and its application to different quantum gases, and electron gas in both metals and semiconductors is considered in a nonequilibrium state. A separate chapter treats the statistical theory of thermodynamic properties of an electron gas in a quantizing magnetic field.
CIN classification and prediction using machine learning methods
Chirkina, Anastasia; Medvedeva, Marina; Komotskiy, Evgeny
2017-06-01
The aim of this paper is a comparison of the existing classification algorithms with different parameters, and selection those ones, which allows solving the problem of primary diagnosis of cervical intraepithelial neoplasia (CIN), as it characterizes the condition of the body in the precancerous stage. The paper describes a feature selection process, as well as selection of the best models for a multiclass classification.
A simple method to combine multiple molecular biomarkers for dichotomous diagnostic classification
Directory of Open Access Journals (Sweden)
Amin Manik A
2006-10-01
Full Text Available Abstract Background In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that – assuming each molecular biomarker to be a diagnostic test – enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i subjects having cancer with those without; ii subjects with two different cancers; iii subjects with two different types of one cancer; and iv subjects with same cancer resulting in differential time to metastasis. Results Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed. Conclusion The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
Application of statistical method for FBR plant transient computation
International Nuclear Information System (INIS)
Kikuchi, Norihiro; Mochizuki, Hiroyasu
2014-01-01
Highlights: • A statistical method with a large trial number up to 10,000 is applied to the plant system analysis. • A turbine trip test conducted at the “Monju” reactor is selected as a plant transient. • A reduction method of trial numbers is discussed. • The result with reduced trial number can express the base regions of the computed distribution. -- Abstract: It is obvious that design tolerances, errors included in operation, and statistical errors in empirical correlations effect on the transient behavior. The purpose of the present study is to apply above mentioned statistical errors to a plant system computation in order to evaluate the statistical distribution contained in the transient evolution. A selected computation case is the turbine trip test conducted at 40% electric power of the prototype fast reactor “Monju”. All of the heat transport systems of “Monju” are modeled with the NETFLOW++ system code which has been validated using the plant transient tests of the experimental fast reactor Joyo, and “Monju”. The effects of parameters on upper plenum temperature are confirmed by sensitivity analyses, and dominant parameters are chosen. The statistical errors are applied to each computation deck by using a pseudorandom number and the Monte-Carlo method. The dSFMT (Double precision SIMD-oriented Fast Mersenne Twister) that is developed version of Mersenne Twister (MT), is adopted as the pseudorandom number generator. In the present study, uniform random numbers are generated by dSFMT, and these random numbers are transformed to the normal distribution by the Box–Muller method. Ten thousands of different computations are performed at once. In every computation case, the steady calculation is performed for 12,000 s, and transient calculation is performed for 4000 s. In the purpose of the present statistical computation, it is important that the base regions of distribution functions should be calculated precisely. A large number of
Recent Advances in Conotoxin Classification by Using Machine Learning Methods.
Dao, Fu-Ying; Yang, Hui; Su, Zhen-Dong; Yang, Wuritu; Wu, Yun; Hui, Ding; Chen, Wei; Tang, Hua; Lin, Hao
2017-06-25
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.
A Classification Method for Seed Viability Assessment with Infrared Thermography
Directory of Open Access Journals (Sweden)
Sen Men
2017-04-01
Full Text Available This paper presents a viability assessment method for Pisum sativum L. seeds based on the infrared thermography technique. In this work, different artificial treatments were conducted to prepare seeds samples with different viability. Thermal images and visible images were recorded every five minutes during the standard five day germination test. After the test, the root length of each sample was measured, which can be used as the viability index of that seed. Each individual seed area in the visible images was segmented with an edge detection method, and the average temperature of the corresponding area in the infrared images was calculated as the representative temperature for this seed at that time. The temperature curve of each seed during germination was plotted. Thirteen characteristic parameters extracted from the temperature curve were analyzed to show the difference of the temperature fluctuations between the seeds samples with different viability. With above parameters, support vector machine (SVM was used to classify the seed samples into three categories: viable, aged and dead according to the root length, the classification accuracy rate was 95%. On this basis, with the temperature data of only the first three hours during the germination, another SVM model was proposed to classify the seed samples, and the accuracy rate was about 91.67%. From these experimental results, it can be seen that infrared thermography can be applied for the prediction of seed viability, based on the SVM algorithm.
Calisto, A; Bramanti, A; Galeano, M; Angileri, F; Campobello, G; Serrano, S; Azzerboni, B
2009-01-01
The objective of this study is to investigate Id-iopatic Normal Pressure Hydrocephalus (INPH) through a multidimensional and multiparameter analysis of statistical data obtained from accurate analysis of Intracranial Pressure (ICP) recordings. Such a study could permit to detect new factors, correlated with therapeutic response, which are able to validate a predicting significance for infusion test. The algorithm developed by the authors computes 13 ICP parameter trends on each of the recording, afterward 9 statistical information from each trend is determined. All data are transferred to the datamining software WEKA. According to the exploited feature-selection techniques, the WEKA has revealed that the most significant statistical parameter is the maximum of Single-Wave-Amplitude: setting a 27 mmHg threshold leads to over 90% of correct classification.
A survey of available margin in a PWR RIA with statistical methods and 3D kinetics
International Nuclear Information System (INIS)
Riverola Gurruchaga, J.; Nunez Rodriguez, T.
2010-01-01
This paper investigates the recovery of margin in a PWR RIA simulation with 3D kinetics, due to statistical techniques. The chosen reference core is a typical 12 feet, 17*17 PWR, with very low leakage loading pattern strategy and gadolinium oxide as burnable poison. The PARCS calculated average nuclear power and nodal power are transferred to a hot spot model for a sequential calculation of fuel temperature and enthalpy responses allowing for independent hypothesis in both calculations. The hot spot analysis is done with a pellet type model with RELAP. The analysis is done at HZP and EOC, since this state is the most limiting one respect to the enthalpy rise criterion, compared to other burn-up condition or initial power cases. In this work, the enthalpy increase is estimated with several statistical methods of propagation of uncertainties: order statistics, parametric statistics, surface response and sensitivities. A discussion on the advantages and disadvantages of each method is also presented. This statistical analysis is also useful to confirm a previous classification of parameters and assumptions according to their importance for the simulation, and found to be consistent with the state of the art in the published literature. These parameters include ejected rod worth and ejection time, delayed neutron fraction and yields, nuclear power peaking factor, and Doppler. (authors)
Directory of Open Access Journals (Sweden)
Chen Cao
2016-09-01
Full Text Available This study focused on producing flash flood hazard susceptibility maps (FFHSM using frequency ratio (FR and statistical index (SI models in the Xiqu Gully (XQG of Beijing, China. First, a total of 85 flash flood hazard locations (n = 85 were surveyed in the field and plotted using geographic information system (GIS software. Based on the flash flood hazard locations, a flood hazard inventory map was built. Seventy percent (n = 60 of the flooding hazard locations were randomly selected for building the models. The remaining 30% (n = 25 of the flooded hazard locations were used for validation. Considering that the XQG used to be a coal mining area, coalmine caves and subsidence caused by coal mining exist in this catchment, as well as many ground fissures. Thus, this study took the subsidence risk level into consideration for FFHSM. The ten conditioning parameters were elevation, slope, curvature, land use, geology, soil texture, subsidence risk area, stream power index (SPI, topographic wetness index (TWI, and short-term heavy rain. This study also tested different classification schemes for the values for each conditional parameter and checked their impacts on the results. The accuracy of the FFHSM was validated using area under the curve (AUC analysis. Classification accuracies were 86.61%, 83.35%, and 78.52% using frequency ratio (FR-natural breaks, statistical index (SI-natural breaks and FR-manual classification schemes, respectively. Associated prediction accuracies were 83.69%, 81.22%, and 74.23%, respectively. It was found that FR modeling using a natural breaks classification method was more appropriate for generating FFHSM for the Xiqu Gully.
Statistical methods for assessing agreement between continuous measurements
DEFF Research Database (Denmark)
Sokolowski, Ineta; Hansen, Rikke Pilegaard; Vedsted, Peter
Background: Clinical research often involves study of agreement amongst observers. Agreement can be measured in different ways, and one can obtain quite different values depending on which method one uses. Objective: We review the approaches that have been discussed to assess the agreement between...... continuous measures and discuss their strengths and weaknesses. Different methods are illustrated using actual data from the `Delay in diagnosis of cancer in general practice´ project in Aarhus, Denmark. Subjects and Methods: We use weighted kappa-statistic, intraclass correlation coefficient (ICC......), concordance coefficient, Bland-Altman limits of agreement and percentage of agreement to assess the agreement between patient reported delay and doctor reported delay in diagnosis of cancer in general practice. Key messages: The correct statistical approach is not obvious. Many studies give the product...
Statistical disclosure control for microdata methods and applications in R
Templ, Matthias
2017-01-01
This book on statistical disclosure control presents the theory, applications and software implementation of the traditional approach to (micro)data anonymization, including data perturbation methods, disclosure risk, data utility, information loss and methods for simulating synthetic data. Introducing readers to the R packages sdcMicro and simPop, the book also features numerous examples and exercises with solutions, as well as case studies with real-world data, accompanied by the underlying R code to allow readers to reproduce all results. The demand for and volume of data from surveys, registers or other sources containing sensible information on persons or enterprises have increased significantly over the last several years. At the same time, privacy protection principles and regulations have imposed restrictions on the access and use of individual data. Proper and secure microdata dissemination calls for the application of statistical disclosure control methods to the data before release. This book is in...
Applied statistical methods in agriculture, health and life sciences
Lawal, Bayo
2014-01-01
This textbook teaches crucial statistical methods to answer research questions using a unique range of statistical software programs, including MINITAB and R. This textbook is developed for undergraduate students in agriculture, nursing, biology and biomedical research. Graduate students will also find it to be a useful way to refresh their statistics skills and to reference software options. The unique combination of examples is approached using MINITAB and R for their individual strengths. Subjects covered include among others data description, probability distributions, experimental design, regression analysis, randomized design and biological assay. Unlike other biostatistics textbooks, this text also includes outliers, influential observations in regression and an introduction to survival analysis. Material is taken from the author's extensive teaching and research in Africa, USA and the UK. Sample problems, references and electronic supplementary material accompany each chapter.
Identification of mine waters by statistical multivariate methods
Energy Technology Data Exchange (ETDEWEB)
Mali, N [IGGG, Ljubljana (Slovenia)
1992-01-01
Three water-bearing aquifers are present in the Velenje lignite mine. The aquifer waters have differing chemical composition; a geochemical water analysis can therefore determine the source of mine water influx. Mine water samples from different locations in the mine were analyzed, the results of chemical content and of electric conductivity of mine water were statistically processed by means of MICROGAS, SPSS-X and IN STATPAC computer programs, which apply three multivariate statistical methods (discriminate, cluster and factor analysis). Reliability of calculated values was determined with the Kolmogorov and Smirnov tests. It is concluded that laboratory analysis of single water samples can produce measurement errors, but statistical processing of water sample data can identify origin and movement of mine water. 15 refs.
Nonequilibrium Statistical Operator Method and Generalized Kinetic Equations
Kuzemsky, A. L.
2018-01-01
We consider some principal problems of nonequilibrium statistical thermodynamics in the framework of the Zubarev nonequilibrium statistical operator approach. We present a brief comparative analysis of some approaches to describing irreversible processes based on the concept of nonequilibrium Gibbs ensembles and their applicability to describing nonequilibrium processes. We discuss the derivation of generalized kinetic equations for a system in a heat bath. We obtain and analyze a damped Schrödinger-type equation for a dynamical system in a heat bath. We study the dynamical behavior of a particle in a medium taking the dissipation effects into account. We consider the scattering problem for neutrons in a nonequilibrium medium and derive a generalized Van Hove formula. We show that the nonequilibrium statistical operator method is an effective, convenient tool for describing irreversible processes in condensed matter.
Identifying Reflectors in Seismic Images via Statistic and Syntactic Methods
Directory of Open Access Journals (Sweden)
Carlos A. Perez
2010-04-01
Full Text Available In geologic interpretation of seismic reflection data, accurate identification of reflectors is the foremost step to ensure proper subsurface structural definition. Reflector information, along with other data sets, is a key factor to predict the presence of hydrocarbons. In this work, mathematic and pattern recognition theory was adapted to design two statistical and two syntactic algorithms which constitute a tool in semiautomatic reflector identification. The interpretive power of these four schemes was evaluated in terms of prediction accuracy and computational speed. Among these, the semblance method was confirmed to render the greatest accuracy and speed. Syntactic methods offer an interesting alternative due to their inherently structural search method.
Feature Selection and Classification of Ulcerated Lesions Using Statistical Analysis for WCE Images
Directory of Open Access Journals (Sweden)
Shipra Suman
2017-10-01
Full Text Available Wireless capsule endoscopy (WCE is a technology developed to inspect the whole gastrointestinal tract (especially the small bowel area that is unreachable using the traditional endoscopy procedure for various abnormalities in a non-invasive manner. However, visualization of a massive number of images is a very time-consuming and tedious task for physicians (prone to human error. Thus, an automatic scheme for lesion detection in WCE videos is a potential solution to alleviate this problem. In this work, a novel statistical approach was chosen for differentiating ulcer and non-ulcer pixels using various color spaces (or more specifically using relevant color bands. The chosen feature vector was used to compute the performance metrics using SVM with grid search method for maximum efficiency. The experimental results and analysis showed that the proposed algorithm was robust in detecting ulcers. The performance in terms of accuracy, sensitivity, and specificity are 97.89%, 96.22%, and 95.09%, respectively, which is promising.
Robust classification of motor imagery EEG signals using statistical time–domain features
International Nuclear Information System (INIS)
Khorshidtalab, A; Salami, M J E; Hamedi, M
2013-01-01
The tradeoff between computational complexity and speed, in addition to growing demands for real-time BMI (brain–machine interface) systems, expose the necessity of applying methods with least possible complexity. Willison amplitude (WAMP) and slope sign change (SSC) are two promising time–domain features only if the right threshold value is defined for them. To overcome the drawback of going through trial and error for the determination of a suitable threshold value, modified WAMP and modified SSC are proposed in this paper. Besides, a comprehensive assessment of statistical time–domain features in which their effectiveness is evaluated with a support vector machine (SVM) is presented. To ensure the accuracy of the results obtained by the SVM, the performance of each feature is reassessed with supervised fuzzy C-means. The general assessment shows that every subject had at least one of his performances near or greater than 80%. The obtained results prove that for BMI applications, in which a few errors can be tolerated, these combinations of feature–classifier are suitable. Moreover, features that could perform satisfactorily were selected for feature combination. Combinations of the selected features are evaluated with the SVM, and they could significantly improve the results, in some cases, up to full accuracy. (paper)
Radar Target Classification using Recursive Knowledge-Based Methods
DEFF Research Database (Denmark)
Jochumsen, Lars Wurtz
The topic of this thesis is target classification of radar tracks from a 2D mechanically scanning coastal surveillance radar. The measurements provided by the radar are position data and therefore the classification is mainly based on kinematic data, which is deduced from the position. The target...... been terminated. Therefore, an update of the classification results must be made for each measurement of the target. The data for this work are collected throughout the PhD and are both collected from radars and other sensors such as GPS....
Non-Statistical Methods of Analysing of Bankruptcy Risk
Directory of Open Access Journals (Sweden)
Pisula Tomasz
2015-06-01
Full Text Available The article focuses on assessing the effectiveness of a non-statistical approach to bankruptcy modelling in enterprises operating in the logistics sector. In order to describe the issue more comprehensively, the aforementioned prediction of the possible negative results of business operations was carried out for companies functioning in the Polish region of Podkarpacie, and in Slovakia. The bankruptcy predictors selected for the assessment of companies operating in the logistics sector included 28 financial indicators characterizing these enterprises in terms of their financial standing and management effectiveness. The purpose of the study was to identify factors (models describing the bankruptcy risk in enterprises in the context of their forecasting effectiveness in a one-year and two-year time horizon. In order to assess their practical applicability the models were carefully analysed and validated. The usefulness of the models was assessed in terms of their classification properties, and the capacity to accurately identify enterprises at risk of bankruptcy and healthy companies as well as proper calibration of the models to the data from training sample sets.
RESEARCH ON RISK CLASSIFICATION METHOD OF ASSEMBLY OCCUPANCIES
Directory of Open Access Journals (Sweden)
Hao Yu
2017-10-01
Full Text Available Due to the densely population and mobility characteristics of the crowd, generally accidents happened in assembly occupancies will trigger a chain reaction, and then bring heavy casualties and property loss, and result disastrous consequences. In the context of safety regulation resources limited, building risk classification system of assembly occupancies is important for "scientific predicting, and hierarchical controlling” In this paper, a software with a graphical user interface is designed using MATLAB GUI to analyze and calculate risks of stampede accident caused by gathered crowds in the video. A velocity extraction method based on cross-correlation algorithm is adopted, and the risk characteristic parameters such as velocity variance is also applied. In this way, real-time analysis and early-warning for risks of stampede accident in time and space can be achieved. Also, the algorithm is applied to the surveillance video of the stampede in Shanghai and its feasibility is proved. Empirical research shows that, the assembly occupancies risk rating model built in this paper has good effectiveness, simplicity and practicability, applies to the government safety regulation and organization safety management, and can improve the safety situation of assembly occupancies effectively.
New Graphical Methods and Test Statistics for Testing Composite Normality
Directory of Open Access Journals (Sweden)
Marc S. Paolella
2015-07-01
Full Text Available Several graphical methods for testing univariate composite normality from an i.i.d. sample are presented. They are endowed with correct simultaneous error bounds and yield size-correct tests. As all are based on the empirical CDF, they are also consistent for all alternatives. For one test, called the modified stabilized probability test, or MSP, a highly simplified computational method is derived, which delivers the test statistic and also a highly accurate p-value approximation, essentially instantaneously. The MSP test is demonstrated to have higher power against asymmetric alternatives than the well-known and powerful Jarque-Bera test. A further size-correct test, based on combining two test statistics, is shown to have yet higher power. The methodology employed is fully general and can be applied to any i.i.d. univariate continuous distribution setting.
Statistical learning modeling method for space debris photometric measurement
Sun, Wenjing; Sun, Jinqiu; Zhang, Yanning; Li, Haisen
2016-03-01
Photometric measurement is an important way to identify the space debris, but the present methods of photometric measurement have many constraints on star image and need complex image processing. Aiming at the problems, a statistical learning modeling method for space debris photometric measurement is proposed based on the global consistency of the star image, and the statistical information of star images is used to eliminate the measurement noises. First, the known stars on the star image are divided into training stars and testing stars. Then, the training stars are selected as the least squares fitting parameters to construct the photometric measurement model, and the testing stars are used to calculate the measurement accuracy of the photometric measurement model. Experimental results show that, the accuracy of the proposed photometric measurement model is about 0.1 magnitudes.
Statistic method of research reactors maximum permissible power calculation
International Nuclear Information System (INIS)
Grosheva, N.A.; Kirsanov, G.A.; Konoplev, K.A.; Chmshkyan, D.V.
1998-01-01
The technique for calculating maximum permissible power of a research reactor at which the probability of the thermal-process accident does not exceed the specified value, is presented. The statistical method is used for the calculations. It is regarded that the determining function related to the reactor safety is the known function of the reactor power and many statistically independent values which list includes the reactor process parameters, geometrical characteristics of the reactor core and fuel elements, as well as random factors connected with the reactor specific features. Heat flux density or temperature is taken as a limiting factor. The program realization of the method discussed is briefly described. The results of calculating the PIK reactor margin coefficients for different probabilities of the thermal-process accident are considered as an example. It is shown that the probability of an accident with fuel element melting in hot zone is lower than 10 -8 1 per year for the reactor rated power [ru
Applied systems ecology: models, data, and statistical methods
Energy Technology Data Exchange (ETDEWEB)
Eberhardt, L L
1976-01-01
In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Multivariate methods and forecasting with IBM SPSS statistics
Aljandali, Abdulkader
2017-01-01
This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...
Mathematical and Statistical Methods for Actuarial Sciences and Finance
Legros, Florence; Perna, Cira; Sibillo, Marilena
2017-01-01
This volume gathers selected peer-reviewed papers presented at the international conference "MAF 2016 – Mathematical and Statistical Methods for Actuarial Sciences and Finance”, held in Paris (France) at the Université Paris-Dauphine from March 30 to April 1, 2016. The contributions highlight new ideas on mathematical and statistical methods in actuarial sciences and finance. The cooperation between mathematicians and statisticians working in insurance and finance is a very fruitful field, one that yields unique theoretical models and practical applications, as well as new insights in the discussion of problems of national and international interest. This volume is addressed to academicians, researchers, Ph.D. students and professionals.
Statistical methods for longitudinal data with agricultural applications
DEFF Research Database (Denmark)
Anantharama Ankinakatte, Smitha
The PhD study focuses on modeling two kings of longitudinal data arising in agricultural applications: continuous time series data and discrete longitudinal data. Firstly, two statistical methods, neural networks and generalized additive models, are applied to predict masistis using multivariate...... algorithm. This was found to compare favourably with the algorithm implemented in the well-known Beagle software. Finally, an R package to apply APFA models developed as part of the PhD project is described...
Statistical methods in nuclear material accountancy: Past, present and future
International Nuclear Information System (INIS)
Pike, D.J.; Woods, A.J.
1983-01-01
The analysis of nuclear material inventory data is motivated by the desire to detect any loss or diversion of nuclear material, insofar as such detection may be feasible by statistical analysis of repeated inventory and throughput measurements. The early regulations, which laid down the specifications for the analysis of inventory data, were framed without acknowledging the essentially sequential nature of the data. It is the broad aim of this paper to discuss the historical nature of statistical analysis of inventory data including an evaluation of why statistical methods should be required at all. If it is accepted that statistical techniques are required, then two main areas require extensive discussion. First, it is important to assess the extent to which stated safeguards aims can be met in practice. Second, there is a vital need for reassessment of the statistical techniques which have been proposed for use in nuclear material accountancy. Part of this reassessment must involve a reconciliation of the apparent differences in philosophy shown by statisticians; but, in addition, the techniques themselves need comparative study to see to what extent they are capable of meeting realistic safeguards aims. This paper contains a brief review of techniques with an attempt to compare and contrast the approaches. It will be suggested that much current research is following closely similar lines, and that national and international bodies should encourage collaborative research and practical in-plant implementations. The techniques proposed require credibility and power; but at this point in time statisticians require credibility and a greater level of unanimity in their approach. A way ahead is proposed based on a clear specification of realistic safeguards aims, and a development of a unified statistical approach with encouragement for the performance of joint research. (author)
State analysis of BOP using statistical and heuristic methods
International Nuclear Information System (INIS)
Heo, Gyun Young; Chang, Soon Heung
2003-01-01
Under the deregulation environment, the performance enhancement of BOP in nuclear power plants is being highlighted. To analyze performance level of BOP, we use the performance test procedures provided from an authorized institution such as ASME. However, through plant investigation, it was proved that the requirements of the performance test procedures about the reliability and quantity of sensors was difficult to be satisfied. As a solution of this, state analysis method that are the expanded concept of signal validation, was proposed on the basis of the statistical and heuristic approaches. Authors recommended the statistical linear regression model by analyzing correlation among BOP parameters as a reference state analysis method. Its advantage is that its derivation is not heuristic, it is possible to calculate model uncertainty, and it is easy to apply to an actual plant. The error of the statistical linear regression model is below 3% under normal as well as abnormal system states. Additionally a neural network model was recommended since the statistical model is impossible to apply to the validation of all of the sensors and is sensitive to the outlier that is the signal located out of a statistical distribution. Because there are a lot of sensors need to be validated in BOP, wavelet analysis (WA) were applied as a pre-processor for the reduction of input dimension and for the enhancement of training accuracy. The outlier localization capability of WA enhanced the robustness of the neural network. The trained neural network restored the degraded signals to the values within ±3% of the true signals
Analysis and classification of ECG-waves and rhythms using circular statistics and vector strength
Directory of Open Access Journals (Sweden)
Janßen Jan-Dirk
2017-09-01
Full Text Available The most common way to analyse heart rhythm is to calculate the RR-interval and the heart rate variability. For further evaluation, descriptive statistics are often used. Here we introduce a new and more natural heart rhythm analysis tool that is based on circular statistics and vector strength. Vector strength is a tool to measure the periodicity or lack of periodicity of a signal. We divide the signal into non-overlapping window segments and project the detected R-waves around the unit circle using the complex exponential function and the median RR-interval. In addition, we calculate the vector strength and apply circular statistics as wells as an angular histogram on the R-wave vectors. This approach enables an intuitive visualization and analysis of rhythmicity. Our results show that ECG-waves and rhythms can be easily visualized, analysed and classified by circular statistics and vector strength.
Statistical benchmarking in utility regulation: Role, standards and methods
International Nuclear Information System (INIS)
Newton Lowry, Mark; Getachew, Lullit
2009-01-01
Statistical benchmarking is being used with increasing frequency around the world in utility rate regulation. We discuss how and where benchmarking is in use for this purpose and the pros and cons of regulatory benchmarking. We then discuss alternative performance standards and benchmarking methods in regulatory applications. We use these to propose guidelines for the appropriate use of benchmarking in the rate setting process. The standards, which we term the competitive market and frontier paradigms, have a bearing on method selection. These along with regulatory experience suggest that benchmarking can either be used for prudence review in regulation or to establish rates or rate setting mechanisms directly
Statistical methods of spin assignment in compound nuclear reactions
International Nuclear Information System (INIS)
Mach, H.; Johns, M.W.
1984-01-01
Spin assignment to nuclear levels can be obtained from standard in-beam gamma-ray spectroscopy techniques and in the case of compound nuclear reactions can be complemented by statistical methods. These are based on a correlation pattern between level spin and gamma-ray intensities feeding low-lying levels. Three types of intensity and level spin correlations are found suitable for spin assignment: shapes of the excitation functions, ratio of intensity at two beam energies or populated in two different reactions, and feeding distributions. Various empirical attempts are examined and the range of applicability of these methods as well as the limitations associated with them are given. 12 references
Statistical methods of spin assignment in compound nuclear reactions
International Nuclear Information System (INIS)
Mach, H.; Johns, M.W.
1985-01-01
Spin assignment to nuclear levels can be obtained from standard in-beam gamma-ray spectroscopy techniques and in the case of compound nuclear reactions can be complemented by statistical methods. These are based on a correlation pattern between level spin and gamma-ray intensities feeding low-lying levels. Three types of intensity and level spin correlations are found suitable for spin assignment: shapes of the excitation functions, ratio of intensity at two beam energies or populated in two different reactions, and feeding distributions. Various empirical attempts are examined and the range of applicability of these methods as well as the limitations associated with them are given
On second quantization methods applied to classical statistical mechanics
International Nuclear Information System (INIS)
Matos Neto, A.; Vianna, J.D.M.
1984-01-01
A method of expressing statistical classical results in terms of mathematical entities usually associated to quantum field theoretical treatment of many particle systems (Fock space, commutators, field operators, state vector) is discussed. It is developed a linear response theory using the 'second quantized' Liouville equation introduced by Schonberg. The relationship of this method to that of Prigogine et al. is briefly analyzed. The chain of equations and the spectral representations for the new classical Green's functions are presented. Generalized operators defined on Fock space are discussed. It is shown that the correlation functions can be obtained from Green's functions defined with generalized operators. (Author) [pt
Methods for estimating low-flow statistics for Massachusetts streams
Ries, Kernell G.; Friesz, Paul J.
2000-01-01
Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The
Literature in Focus: Statistical Methods in Experimental Physics
2007-01-01
Frederick James was a high-energy physicist who became the CERN "expert" on statistics and is now well-known around the world, in part for this famous text. The first edition of Statistical Methods in Experimental Physics was originally co-written with four other authors and was published in 1971 by North Holland (now an imprint of Elsevier). It became such an important text that demand for it has continued for more than 30 years. Fred has updated it and it was released in a second edition by World Scientific in 2006. It is still a top seller and there is no exaggeration in calling it «the» reference on the subject. A full review of the title appeared in the October CERN Courier.Come and meet the author to hear more about how this book has flourished during its 35-year lifetime. Frederick James Statistical Methods in Experimental Physics Monday, 26th of November, 4 p.m. Council Chamber (Bldg. 503-1-001) The author will be introduced...
Fuel rod design by statistical methods for MOX fuel
International Nuclear Information System (INIS)
Heins, L.; Landskron, H.
2000-01-01
Statistical methods in fuel rod design have received more and more attention during the last years. One of different possible ways to use statistical methods in fuel rod design can be described as follows: Monte Carlo calculations are performed using the fuel rod code CARO. For each run with CARO, the set of input data is modified: parameters describing the design of the fuel rod (geometrical data, density etc.) and modeling parameters are randomly selected according to their individual distributions. Power histories are varied systematically in a way that each power history of the relevant core management calculation is represented in the Monte Carlo calculations with equal frequency. The frequency distributions of the results as rod internal pressure and cladding strain which are generated by the Monte Carlo calculation are evaluated and compared with the design criteria. Up to now, this methodology has been applied to licensing calculations for PWRs and BWRs, UO 2 and MOX fuel, in 3 countries. Especially for the insertion of MOX fuel resulting in power histories with relatively high linear heat generation rates at higher burnup, the statistical methodology is an appropriate approach to demonstrate the compliance of licensing requirements. (author)
Heterogeneous Rock Simulation Using DIP-Micromechanics-Statistical Methods
Directory of Open Access Journals (Sweden)
H. Molladavoodi
2018-01-01
Full Text Available Rock as a natural material is heterogeneous. Rock material consists of minerals, crystals, cement, grains, and microcracks. Each component of rock has a different mechanical behavior under applied loading condition. Therefore, rock component distribution has an important effect on rock mechanical behavior, especially in the postpeak region. In this paper, the rock sample was studied by digital image processing (DIP, micromechanics, and statistical methods. Using image processing, volume fractions of the rock minerals composing the rock sample were evaluated precisely. The mechanical properties of the rock matrix were determined based on upscaling micromechanics. In order to consider the rock heterogeneities effect on mechanical behavior, the heterogeneity index was calculated in a framework of statistical method. A Weibull distribution function was fitted to the Young modulus distribution of minerals. Finally, statistical and Mohr–Coulomb strain-softening models were used simultaneously as a constitutive model in DEM code. The acoustic emission, strain energy release, and the effect of rock heterogeneities on the postpeak behavior process were investigated. The numerical results are in good agreement with experimental data.
THE FLUORBOARD A STATISTICALLY BASED DASHBOARD METHOD FOR IMPROVING SAFETY
International Nuclear Information System (INIS)
PREVETTE, S.S.
2005-01-01
The FluorBoard is a statistically based dashboard method for improving safety. Fluor Hanford has achieved significant safety improvements--including more than a 80% reduction in OSHA cases per 200,000 hours, during its work at the US Department of Energy's Hanford Site in Washington state. The massive project on the former nuclear materials production site is considered one of the largest environmental cleanup projects in the world. Fluor Hanford's safety improvements were achieved by a committed partnering of workers, managers, and statistical methodology. Safety achievements at the site have been due to a systematic approach to safety. This includes excellent cooperation between the field workers, the safety professionals, and management through OSHA Voluntary Protection Program principles. Fluor corporate values are centered around safety, and safety excellence is important for every manager in every project. In addition, Fluor Hanford has utilized a rigorous approach to using its safety statistics, based upon Dr. Shewhart's control charts, and Dr. Deming's management and quality methods
Abbasian Ardakani, Ali; Gharbali, Akbar; Mohammadi, Afshin
2015-01-01
The aim of this study was to evaluate computer aided diagnosis (CAD) system with texture analysis (TA) to improve radiologists' accuracy in identification of thyroid nodules as malignant or benign. A total of 70 cases (26 benign and 44 malignant) were analyzed in this study. We extracted up to 270 statistical texture features as a descriptor for each selected region of interests (ROIs) in three normalization schemes (default, 3s and 1%-99%). Then features by the lowest probability of classification error and average correlation coefficients (POE+ACC), and Fisher coefficient (Fisher) eliminated to 10 best and most effective features. These features were analyzed under standard and nonstandard states. For TA of the thyroid nodules, Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Non-Linear Discriminant Analysis (NDA) were applied. First Nearest-Neighbour (1-NN) classifier was performed for the features resulting from PCA and LDA. NDA features were classified by artificial neural network (A-NN). Receiver operating characteristic (ROC) curve analysis was used for examining the performance of TA methods. The best results were driven in 1-99% normalization with features extracted by POE+ACC algorithm and analyzed by NDA with the area under the ROC curve ( Az) of 0.9722 which correspond to sensitivity of 94.45%, specificity of 100%, and accuracy of 97.14%. Our results indicate that TA is a reliable method, can provide useful information help radiologist in detection and classification of benign and malignant thyroid nodules.
On the Evaluation of Outlier Detection and One-Class Classification Methods
DEFF Research Database (Denmark)
Swersky, Lorne; Marques, Henrique O.; Sander, Jörg
2016-01-01
It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem. In this paper, we focus on the comparison of oneclass classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies ...
Statistical physics and computational methods for evolutionary game theory
Javarone, Marco Alberto
2018-01-01
This book presents an introduction to Evolutionary Game Theory (EGT) which is an emerging field in the area of complex systems attracting the attention of researchers from disparate scientific communities. EGT allows one to represent and study several complex phenomena, such as the emergence of cooperation in social systems, the role of conformity in shaping the equilibrium of a population, and the dynamics in biological and ecological systems. Since EGT models belong to the area of complex systems, statistical physics constitutes a fundamental ingredient for investigating their behavior. At the same time, the complexity of some EGT models, such as those realized by means of agent-based methods, often require the implementation of numerical simulations. Therefore, beyond providing an introduction to EGT, this book gives a brief overview of the main statistical physics tools (such as phase transitions and the Ising model) and computational strategies for simulating evolutionary games (such as Monte Carlo algor...
Myint, Soe W.; Mesev, Victor; Quattrochi, Dale; Wentz, Elizabeth A.
2013-01-01
Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification
Huffman and linear scanning methods with statistical language models.
Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris
2015-03-01
Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.
Statistical Method to Overcome Overfitting Issue in Rational Function Models
Alizadeh Moghaddam, S. H.; Mokhtarzade, M.; Alizadeh Naeini, A.; Alizadeh Moghaddam, S. A.
2017-09-01
Rational function models (RFMs) are known as one of the most appealing models which are extensively applied in geometric correction of satellite images and map production. Overfitting is a common issue, in the case of terrain dependent RFMs, that degrades the accuracy of RFMs-derived geospatial products. This issue, resulting from the high number of RFMs' parameters, leads to ill-posedness of the RFMs. To tackle this problem, in this study, a fast and robust statistical approach is proposed and compared to Tikhonov regularization (TR) method, as a frequently-used solution to RFMs' overfitting. In the proposed method, a statistical test, namely, significance test is applied to search for the RFMs' parameters that are resistant against overfitting issue. The performance of the proposed method was evaluated for two real data sets of Cartosat-1 satellite images. The obtained results demonstrate the efficiency of the proposed method in term of the achievable level of accuracy. This technique, indeed, shows an improvement of 50-80% over the TR.
Radiological decontamination, survey, and statistical release method for vehicles
International Nuclear Information System (INIS)
Goodwill, M.E.; Lively, J.W.; Morris, R.L.
1996-06-01
Earth-moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium millsite in Monticello, Utah (a cleanup site regulated under the Comprehensive Environmental Response, Compensation, and Liability Act). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site
Survival analysis and classification methods for forest fire size.
Tremblay, Pier-Olivier; Duchesne, Thierry; Cumming, Steven G
2018-01-01
Factors affecting wildland-fire size distribution include weather, fuels, and fire suppression activities. We present a novel application of survival analysis to quantify the effects of these factors on a sample of sizes of lightning-caused fires from Alberta, Canada. Two events were observed for each fire: the size at initial assessment (by the first fire fighters to arrive at the scene) and the size at "being held" (a state when no further increase in size is expected). We developed a statistical classifier to try to predict cases where there will be a growth in fire size (i.e., the size at "being held" exceeds the size at initial assessment). Logistic regression was preferred over two alternative classifiers, with covariates consistent with similar past analyses. We conducted survival analysis on the group of fires exhibiting a size increase. A screening process selected three covariates: an index of fire weather at the day the fire started, the fuel type burning at initial assessment, and a factor for the type and capabilities of the method of initial attack. The Cox proportional hazards model performed better than three accelerated failure time alternatives. Both fire weather and fuel type were highly significant, with effects consistent with known fire behaviour. The effects of initial attack method were not statistically significant, but did suggest a reverse causality that could arise if fire management agencies were to dispatch resources based on a-priori assessment of fire growth potentials. We discuss how a more sophisticated analysis of larger data sets could produce unbiased estimates of fire suppression effect under such circumstances.
Survival analysis and classification methods for forest fire size
2018-01-01
Factors affecting wildland-fire size distribution include weather, fuels, and fire suppression activities. We present a novel application of survival analysis to quantify the effects of these factors on a sample of sizes of lightning-caused fires from Alberta, Canada. Two events were observed for each fire: the size at initial assessment (by the first fire fighters to arrive at the scene) and the size at “being held” (a state when no further increase in size is expected). We developed a statistical classifier to try to predict cases where there will be a growth in fire size (i.e., the size at “being held” exceeds the size at initial assessment). Logistic regression was preferred over two alternative classifiers, with covariates consistent with similar past analyses. We conducted survival analysis on the group of fires exhibiting a size increase. A screening process selected three covariates: an index of fire weather at the day the fire started, the fuel type burning at initial assessment, and a factor for the type and capabilities of the method of initial attack. The Cox proportional hazards model performed better than three accelerated failure time alternatives. Both fire weather and fuel type were highly significant, with effects consistent with known fire behaviour. The effects of initial attack method were not statistically significant, but did suggest a reverse causality that could arise if fire management agencies were to dispatch resources based on a-priori assessment of fire growth potentials. We discuss how a more sophisticated analysis of larger data sets could produce unbiased estimates of fire suppression effect under such circumstances. PMID:29320497
Directory of Open Access Journals (Sweden)
Monique A Ladds
Full Text Available Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions. We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM, random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal, testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%. Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding-were all predicted with reasonable accuracy (52-81% by the SVM while travelling was poorly categorised (31-41%. These results show that model selection is important when classifying behaviour and that by using
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Comparison of four classification methods for brain-computer interface
Czech Academy of Sciences Publication Activity Database
Frolov, A.; Húsek, Dušan; Bobrov, P.
2011-01-01
Roč. 21, č. 2 (2011), s. 101-115 ISSN 1210-0552 R&D Projects: GA MŠk(CZ) 1M0567; GA ČR GA201/05/0079; GA ČR GAP202/10/0262 Institutional research plan: CEZ:AV0Z10300504 Keywords : brain computer interface * motor imagery * visual imagery * EEG pattern classification * Bayesian classification * Common Spatial Patterns * Common Tensor Discriminant Analysis Subject RIV: IN - Informatics, Computer Science Impact factor: 0.646, year: 2011
Land-Use and Land-Cover Mapping Using a Gradable Classification Method
Directory of Open Access Journals (Sweden)
Keigo Kitada
2012-05-01
Full Text Available Conventional spectral-based classification methods have significant limitations in the digital classification of urban land-use and land-cover classes from high-resolution remotely sensed data because of the lack of consideration given to the spatial properties of images. To recognize the complex distribution of urban features in high-resolution image data, texture information consisting of a group of pixels should be considered. Lacunarity is an index used to characterize different texture appearances. It is often reported that the land-use and land-cover in urban areas can be effectively classified using the lacunarity index with high-resolution images. However, the applicability of the maximum-likelihood approach for hybrid analysis has not been reported. A more effective approach that employs the original spectral data and lacunarity index can be expected to improve the accuracy of the classification. A new classification procedure referred to as “gradable classification method” is proposed in this study. This method improves the classification accuracy in incremental steps. The proposed classification approach integrates several classification maps created from original images and lacunarity maps, which consist of lacnarity values, to create a new classification map. The results of this study confirm the suitability of the gradable classification approach, which produced a higher overall accuracy (68% and kappa coefficient (0.64 than those (65% and 0.60, respectively obtained with the maximum-likelihood approach.
Wu, Zi Yi; Xie, Ping; Sang, Yan Fang; Gu, Hai Ting
2018-04-01
The phenomenon of jump is one of the importantly external forms of hydrological variabi-lity under environmental changes, representing the adaption of hydrological nonlinear systems to the influence of external disturbances. Presently, the related studies mainly focus on the methods for identifying the jump positions and jump times in hydrological time series. In contrast, few studies have focused on the quantitative description and classification of jump degree in hydrological time series, which make it difficult to understand the environmental changes and evaluate its potential impacts. Here, we proposed a theatrically reliable and easy-to-apply method for the classification of jump degree in hydrological time series, using the correlation coefficient as a basic index. The statistical tests verified the accuracy, reasonability, and applicability of this method. The relationship between the correlation coefficient and the jump degree of series were described using mathematical equation by derivation. After that, several thresholds of correlation coefficients under different statistical significance levels were chosen, based on which the jump degree could be classified into five levels: no, weak, moderate, strong and very strong. Finally, our method was applied to five diffe-rent observed hydrological time series, with diverse geographic and hydrological conditions in China. The results of the classification of jump degrees in those series were closely accorded with their physically hydrological mechanisms, indicating the practicability of our method.
Mathematical and statistical methods for actuarial sciences and finance
Sibillo, Marilena
2014-01-01
The interaction between mathematicians and statisticians working in the actuarial and financial fields is producing numerous meaningful scientific results. This volume, comprising a series of four-page papers, gathers new ideas relating to mathematical and statistical methods in the actuarial sciences and finance. The book covers a variety of topics of interest from both theoretical and applied perspectives, including: actuarial models; alternative testing approaches; behavioral finance; clustering techniques; coherent and non-coherent risk measures; credit-scoring approaches; data envelopment analysis; dynamic stochastic programming; financial contagion models; financial ratios; intelligent financial trading systems; mixture normality approaches; Monte Carlo-based methodologies; multicriteria methods; nonlinear parameter estimation techniques; nonlinear threshold models; particle swarm optimization; performance measures; portfolio optimization; pricing methods for structured and non-structured derivatives; r...
Evolutionary Computation Methods and their applications in Statistics
Directory of Open Access Journals (Sweden)
Francesco Battaglia
2013-05-01
Full Text Available A brief discussion of the genesis of evolutionary computation methods, their relationship to artificial intelligence, and the contribution of genetics and Darwin’s theory of natural evolution is provided. Then, the main evolutionary computation methods are illustrated: evolution strategies, genetic algorithms, estimation of distribution algorithms, differential evolution, and a brief description of some evolutionary behavior methods such as ant colony and particle swarm optimization. We also discuss the role of the genetic algorithm for multivariate probability distribution random generation, rather than as a function optimizer. Finally, some relevant applications of genetic algorithm to statistical problems are reviewed: selection of variables in regression, time series model building, outlier identification, cluster analysis, design of experiments.
DEFF Research Database (Denmark)
Inwood, Jennifer; Lofi, Johanna; Davies, Sarah
2013-01-01
In this study, a novel application of a statistical approach is utilized for analysis of downhole logging data from Miocene-aged siliciclastic shelf sediments on the New Jersey Margin (eastern USA). A multivariate iterative nonhierarchical cluster analysis (INCA) of spectral gamma-ray logs from I...
Exact null distributions of quadratic distribution-free statistics for two-way classification
Wiel, van de M.A.
2004-01-01
Abstract We present new techniques for computing exact distributions of `Friedman-type¿ statistics. Representing the null distribution by a generating function allows for the use of general, not necessarily integer-valued rank scores. Moreover, we use symmetry properties of the multivariate
CSIR Research Space (South Africa)
Ntaka, L
2013-08-01
Full Text Available . In this work, statistical inference approach specifically the non-parametric bootstrapping and linear model were applied. Data used to develop the model were sourced from the literature. 104 data points with information on aggregation, natural organic matter...
Hybrid perturbation methods based on statistical time series models
San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario
2016-04-01
In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.
Bayesian statistic methods and theri application in probabilistic simulation models
Directory of Open Access Journals (Sweden)
Sergio Iannazzo
2007-03-01
Full Text Available Bayesian statistic methods are facing a rapidly growing level of interest and acceptance in the field of health economics. The reasons of this success are probably to be found on the theoretical fundaments of the discipline that make these techniques more appealing to decision analysis. To this point should be added the modern IT progress that has developed different flexible and powerful statistical software framework. Among them probably one of the most noticeably is the BUGS language project and its standalone application for MS Windows WinBUGS. Scope of this paper is to introduce the subject and to show some interesting applications of WinBUGS in developing complex economical models based on Markov chains. The advantages of this approach reside on the elegance of the code produced and in its capability to easily develop probabilistic simulations. Moreover an example of the integration of bayesian inference models in a Markov model is shown. This last feature let the analyst conduce statistical analyses on the available sources of evidence and exploit them directly as inputs in the economic model.
Chung, Ka-Fai; Yeung, Wing-Fai; Ho, Fiona Yan-Yee; Yung, Kam-Ping; Yu, Yee-Man; Kwok, Chi-Wa
2015-04-01
To compare the prevalence of insomnia according to symptoms, quantitative criteria, and Diagnostic and Statistical Manual of Mental Disorders, 4th and 5th Edition (DSM-IV and DSM-5), International Classification of Diseases, 10th Revision (ICD-10), and International Classification of Sleep Disorders, 2nd Edition (ICSD-2), and to compare the prevalence of insomnia disorder between Hong Kong and the United States by adopting a similar methodology used by the America Insomnia Survey (AIS). Population-based epidemiological survey respondents (n = 2011) completed the Brief Insomnia Questionnaire (BIQ), a validated scale generating DSM-IV, DSM-5, ICD-10, and ICSD-2 insomnia disorder. The weighted prevalence of difficulty falling asleep, difficulty staying asleep, waking up too early, and non-restorative sleep that occurred ≥3 days per week was 14.0%, 28.3%, 32.1%, and 39.9%, respectively. When quantitative criteria were included, the prevalence dropped the most from 39.9% to 8.4% for non-restorative sleep, and the least from 14.0% to 12.9% for difficulty falling asleep. The weighted prevalence of DSM-IV, ICD-10, ICSD-2, and any of the three insomnia disorders was 22.1%, 4.7%, 15.1%, and 22.1%, respectively; for DSM-5 insomnia disorder, it was 10.8%. Compared with 22.1%, 3.9%, and 14.7% for DSM-IV, ICD-10, and ICSD-2 in the AIS, cross-cultural difference in the prevalence of insomnia disorder is less than what is expected. The prevalence is reduced by half from DSM-IV to DSM-5. ICD-10 insomnia disorder has the lowest prevalence, perhaps because excessive concern and preoccupation, one of its diagnostic criteria, is not always present in people with insomnia. Copyright © 2014 Elsevier B.V. All rights reserved.
Application of mathematical statistics methods to study fluorite deposits
International Nuclear Information System (INIS)
Chermeninov, V.B.
1980-01-01
Considered are the applicability of mathematical-statistical methods for the increase of reliability of sampling and geological tasks (study of regularities of ore formation). Compared is the reliability of core sampling (regarding the selective abrasion of fluorite) and neutron activation logging for fluorine. The core sampling data are characterized by higher dispersion than neutron activation logging results (mean value of variation coefficients are 75% and 56% respectively). However the hypothesis of the equality of average two sampling is confirmed; this fact testifies to the absence of considerable variability of ore bodies
Algebraic methods in statistical mechanics and quantum field theory
Emch, Dr Gérard G
2009-01-01
This systematic algebraic approach concerns problems involving a large number of degrees of freedom. It extends the traditional formalism of quantum mechanics, and it eliminates conceptual and mathematical difficulties common to the development of statistical mechanics and quantum field theory. Further, the approach is linked to research in applied and pure mathematics, offering a reflection of the interplay between formulation of physical motivations and self-contained descriptions of the mathematical methods.The four-part treatment begins with a survey of algebraic approaches to certain phys
Statistical methods for determining the effect of mammography screening
DEFF Research Database (Denmark)
Lophaven, Søren
2016-01-01
In an overview of five randomised controlled trials from Sweden, a reduction of 29% was found in breast cancer mortality in women aged 50-69 at randomisation after a follow up of 5-13 years. Organised, population based, mammography service screening was introduced on the basis of these resultsin...... in 2007-2008. Women aged 50-69 were invited to screening every second year. Taking advantage of the registers of population and health, we present statistical methods for evaluating the effect of mammography screening on breast cancer mortality (Olsen et al. 2005, Njor et al. 2015 and Weedon-Fekjær etal...
An Efficient Optimization Method for Solving Unsupervised Data Classification Problems
Directory of Open Access Journals (Sweden)
Parvaneh Shabanzadeh
2015-01-01
Full Text Available Unsupervised data classification (or clustering analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.
Ashraf, M. A. M.; Kumar, N. S.; Yusoh, R.; Hazreek, Z. A. M.; Aziman, M.
2018-04-01
Site classification utilizing average shear wave velocity (Vs(30) up to 30 meters depth is a typical parameter. Numerous geophysical methods have been proposed for estimation of shear wave velocity by utilizing assortment of testing configuration, processing method, and inversion algorithm. Multichannel Analysis of Surface Wave (MASW) method is been rehearsed by numerous specialist and professional to geotechnical engineering for local site characterization and classification. This study aims to determine the site classification on soft and hard ground using MASW method. The subsurface classification was made utilizing National Earthquake Hazards Reduction Program (NERHP) and international Building Code (IBC) classification. Two sites are chosen to acquire the shear wave velocity which is in the state of Pulau Pinang for soft soil and Perlis for hard rock. Results recommend that MASW technique can be utilized to spatially calculate the distribution of shear wave velocity (Vs(30)) in soil and rock to characterize areas.
Statistical inference for classification of RRIM clone series using near IR reflectance properties
Ismail, Faridatul Aima; Madzhi, Nina Korlina; Hashim, Hadzli; Abdullah, Noor Ezan; Khairuzzaman, Noor Aishah; Azmi, Azrie Faris Mohd; Sampian, Ahmad Faiz Mohd; Harun, Muhammad Hafiz
2015-08-01
RRIM clone is a rubber breeding series produced by RRIM (Rubber Research Institute of Malaysia) through "rubber breeding program" to improve latex yield and producing clones attractive to farmers. The objective of this work is to analyse measurement of optical sensing device on latex of selected clone series. The device using transmitting NIR properties and its reflectance is converted in terms of voltage. The obtained reflectance index value via voltage was analyzed using statistical technique in order to find out the discrimination among the clones. From the statistical results using error plots and one-way ANOVA test, there is an overwhelming evidence showing discrimination of RRIM 2002, RRIM 2007 and RRIM 3001 clone series with p value = 0.000. RRIM 2008 cannot be discriminated with RRIM 2014; however both of these groups are distinct from the other clones.
A method for statistically comparing spatial distribution maps
Directory of Open Access Journals (Sweden)
Reynolds Mary G
2009-01-01
Full Text Available Abstract Background Ecological niche modeling is a method for estimation of species distributions based on certain ecological parameters. Thus far, empirical determination of significant differences between independently generated distribution maps for a single species (maps which are created through equivalent processes, but with different ecological input parameters, has been challenging. Results We describe a method for comparing model outcomes, which allows a statistical evaluation of whether the strength of prediction and breadth of predicted areas is measurably different between projected distributions. To create ecological niche models for statistical comparison, we utilized GARP (Genetic Algorithm for Rule-Set Production software to generate ecological niche models of human monkeypox in Africa. We created several models, keeping constant the case location input records for each model but varying the ecological input data. In order to assess the relative importance of each ecological parameter included in the development of the individual predicted distributions, we performed pixel-to-pixel comparisons between model outcomes and calculated the mean difference in pixel scores. We used a two sample Student's t-test, (assuming as null hypothesis that both maps were identical to each other regardless of which input parameters were used to examine whether the mean difference in corresponding pixel scores from one map to another was greater than would be expected by chance alone. We also utilized weighted kappa statistics, frequency distributions, and percent difference to look at the disparities in pixel scores. Multiple independent statistical tests indicated precipitation as the single most important independent ecological parameter in the niche model for human monkeypox disease. Conclusion In addition to improving our understanding of the natural factors influencing the distribution of human monkeypox disease, such pixel-to-pixel comparison
Statistical methods in the mechanical design of fuel assemblies
Energy Technology Data Exchange (ETDEWEB)
Radsak, C.; Streit, D.; Muench, C.J. [AREVA NP GmbH, Erlangen (Germany)
2013-07-01
The mechanical design of a fuel assembly is still being mainly performed in a de terministic way. This conservative approach is however not suitable to provide a realistic quantification of the design margins with respect to licensing criter ia for more and more demanding operating conditions (power upgrades, burnup increase,..). This quantification can be provided by statistical methods utilizing all available information (e.g. from manufacturing, experience feedback etc.) of the topic under consideration. During optimization e.g. of the holddown system certain objectives in the mechanical design of a fuel assembly (FA) can contradict each other, such as sufficient holddown forces enough to prevent fuel assembly lift-off and reducing the holddown forces to minimize axial loads on the fuel assembly structure to ensure no negative effect on the control rod movement.By u sing a statistical method the fuel assembly design can be optimized much better with respect to these objectives than it would be possible based on a deterministic approach. This leads to a more realistic assessment and safer way of operating fuel assemblies. Statistical models are defined on the one hand by the quanti le that has to be maintained concerning the design limit requirements (e.g. one FA quantile) and on the other hand by the confidence level which has to be met. Using the above example of the holddown force, a feasible quantile can be define d based on the requirement that less than one fuel assembly (quantile > 192/19 3 [%] = 99.5 %) in the core violates the holddown force limit w ith a confidence of 95%. (orig.)
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.
Allman, Elizabeth S; Rhodes, John A; Sullivant, Seth
2017-02-01
Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing that the corrected distance outperforms many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well since k-mer methods are usually the first step in constructing a guide tree for such algorithms.
Cuesta-Frau, David; Miró-Martínez, Pau; Jordán Núñez, Jorge; Oltra-Crespo, Sandra; Molina Picó, Antonio
2017-08-01
This paper evaluates the performance of first generation entropy metrics, featured by the well known and widely used Approximate Entropy (ApEn) and Sample Entropy (SampEn) metrics, and what can be considered an evolution from these, Fuzzy Entropy (FuzzyEn), in the Electroencephalogram (EEG) signal classification context. The study uses the commonest artifacts found in real EEGs, such as white noise, and muscular, cardiac, and ocular artifacts. Using two different sets of publicly available EEG records, and a realistic range of amplitudes for interfering artifacts, this work optimises and assesses the robustness of these metrics against artifacts in class segmentation terms probability. The results show that the qualitative behaviour of the two datasets is similar, with SampEn and FuzzyEn performing the best, and the noise and muscular artifacts are the most confounding factors. On the contrary, there is a wide variability as regards initialization parameters. The poor performance achieved by ApEn suggests that this metric should not be used in these contexts. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of bones from MR images in torso PET-MR imaging using a statistical shape model
International Nuclear Information System (INIS)
Reza Ay, Mohammad; Akbarzadeh, Afshin; Ahmadian, Alireza; Zaidi, Habib
2014-01-01
There have been exclusive features for hybrid PET/MRI systems in comparison with its PET/CT counterpart in terms of reduction of radiation exposure, improved soft-tissue contrast and truly simultaneous and multi-parametric imaging capabilities. However, quantitative imaging on PET/MR is challenged by attenuation of annihilation photons through their pathway. The correction for photon attenuation requires the availability of patient-specific attenuation map, which accounts for the spatial distribution of attenuation coefficients of biological tissues. However, the lack of information on electron density in the MR signal poses an inherent difficulty to the derivation of the attenuation map from MR images. In other words, the MR signal correlates with proton densities and tissue relaxation properties, rather than with electron density and, as such, it is not directly related to attenuation coefficients. In order to derive the attenuation map from MR images at 511 keV, various strategies have been proposed and implemented on prototype and commercial PET/MR systems. Segmentation-based methods generate an attenuation map by classification of T1-weighted or high resolution Dixon MR sequences followed by assignment of predefined attenuation coefficients to various tissue types. Intensity-based segmentation approaches fail to include bones in the attenuation map since the segmentation of bones from conventional MR sequences is a difficult task. Most MR-guided attenuation correction techniques ignore bones owing to the inherent difficulties associated with bone segmentation unless specialized MR sequences such as ultra-short echo (UTE) sequence are utilized. In this work, we introduce a new technique based on statistical shape modeling to segment bones and generate a four-class attenuation map. Our segmentation approach requires a torso bone shape model based on principle component analysis (PCA). A CT-based training set including clearly segmented bones of the torso region
Safety bey statistics? A critical view on statistical methods applied in health physics
International Nuclear Information System (INIS)
Kraut, W.
2016-01-01
The only proper way to describe uncertainties in health physics is by statistical means. But statistics never can replace Your personal evaluation of effect, nor can statistics transmute randomness into certainty like an ''uncertainty laundry''. The paper discusses these problems in routine practical work.
A Statistic-Based Calibration Method for TIADC System
Directory of Open Access Journals (Sweden)
Kuojun Yang
2015-01-01
Full Text Available Time-interleaved technique is widely used to increase the sampling rate of analog-to-digital converter (ADC. However, the channel mismatches degrade the performance of time-interleaved ADC (TIADC. Therefore, a statistic-based calibration method for TIADC is proposed in this paper. The average value of sampling points is utilized to calculate offset error, and the summation of sampling points is used to calculate gain error. After offset and gain error are obtained, they are calibrated by offset and gain adjustment elements in ADC. Timing skew is calibrated by an iterative method. The product of sampling points of two adjacent subchannels is used as a metric for calibration. The proposed method is employed to calibrate mismatches in a four-channel 5 GS/s TIADC system. Simulation results show that the proposed method can estimate mismatches accurately in a wide frequency range. It is also proved that an accurate estimation can be obtained even if the signal noise ratio (SNR of input signal is 20 dB. Furthermore, the results obtained from a real four-channel 5 GS/s TIADC system demonstrate the effectiveness of the proposed method. We can see that the spectra spurs due to mismatches have been effectively eliminated after calibration.
Refining developmental coordination disorder subtyping with multivariate statistical methods
Directory of Open Access Journals (Sweden)
Lalanne Christophe
2012-07-01
Full Text Available Abstract Background With a large number of potentially relevant clinical indicators penalization and ensemble learning methods are thought to provide better predictive performance than usual linear predictors. However, little is known about how they perform in clinical studies where few cases are available. We used Random Forests and Partial Least Squares Discriminant Analysis to select the most salient impairments in Developmental Coordination Disorder (DCD and assess patients similarity. Methods We considered a wide-range testing battery for various neuropsychological and visuo-motor impairments which aimed at characterizing subtypes of DCD in a sample of 63 children. Classifiers were optimized on a training sample, and they were used subsequently to rank the 49 items according to a permuted measure of variable importance. In addition, subtyping consistency was assessed with cluster analysis on the training sample. Clustering fitness and predictive accuracy were evaluated on the validation sample. Results Both classifiers yielded a relevant subset of items impairments that altogether accounted for a sharp discrimination between three DCD subtypes: ideomotor, visual-spatial and constructional, and mixt dyspraxia. The main impairments that were found to characterize the three subtypes were: digital perception, imitations of gestures, digital praxia, lego blocks, visual spatial structuration, visual motor integration, coordination between upper and lower limbs. Classification accuracy was above 90% for all classifiers, and clustering fitness was found to be satisfactory. Conclusions Random Forests and Partial Least Squares Discriminant Analysis are useful tools to extract salient features from a large pool of correlated binary predictors, but also provide a way to assess individuals proximities in a reduced factor space. Less than 15 neuro-visual, neuro-psychomotor and neuro-psychological tests might be required to provide a sensitive and
Directory of Open Access Journals (Sweden)
Chunyang Wang
2015-01-01
Full Text Available Study on land use/cover can reflect changing rules of population, economy, agricultural structure adjustment, policy, and traffic and provide better service for the regional economic development and urban evolution. The study on fine land use/cover assessment using hyperspectral image classification is a focal growing area in many fields. Semisupervised learning method which takes a large number of unlabeled samples and minority labeled samples, improving classification and predicting the accuracy effectively, has been a new research direction. In this paper, we proposed improving fine land use/cover assessment based on semisupervised hyperspectral classification method. The test analysis of study area showed that the advantages of semisupervised classification method could improve the high precision overall classification and objective assessment of land use/cover results.
Classification and methodical features of fitness and wellness facilities
Directory of Open Access Journals (Sweden)
Yu. I. Beliak
2014-11-01
Full Text Available Purpose : health and fitness use a large arsenal of different sports and physical activity. Development of fitness industry promotes its expansion and requires classification and methodological features that lead to the use of appropriate fitness programs. Material : more than 60 literature and video of 42 prestigious international fitness - conventions lessons were analyzed. Results : the evolution of species fitness and wellness, as well as the character used in those funds. Conclusions : as a means of improving classification attribute fitness appropriate to use their orientation, according to which they are divided into aerobic, strength exercises that promote flexibility and psychomotor coordination. The main methodological features fitness facilities are highlighted: the variety and interchangeability, clear regulation, the ability to transform, to exercise a selective effect on the body, the ability to solve a wide range of tasks, innovation.
Directory of Open Access Journals (Sweden)
H. Y. Gu
2017-09-01
Full Text Available Classification rule set is important for Land Cover classification, which refers to features and decision rules. The selection of features and decision are based on an iterative trial-and-error approach that is often utilized in GEOBIA, however, it is time-consuming and has a poor versatility. This study has put forward a rule set building method for Land cover classification based on human knowledge and machine learning. The use of machine learning is to build rule sets effectively which will overcome the iterative trial-and-error approach. The use of human knowledge is to solve the shortcomings of existing machine learning method on insufficient usage of prior knowledge, and improve the versatility of rule sets. A two-step workflow has been introduced, firstly, an initial rule is built based on Random Forest and CART decision tree. Secondly, the initial rule is analyzed and validated based on human knowledge, where we use statistical confidence interval to determine its threshold. The test site is located in Potsdam City. We utilised the TOP, DSM and ground truth data. The results show that the method could determine rule set for Land Cover classification semi-automatically, and there are static features for different land cover classes.
A combined reconstruction-classification method for diffuse optical tomography
Energy Technology Data Exchange (ETDEWEB)
Hiltunen, P [Department of Biomedical Engineering and Computational Science, Helsinki University of Technology, PO Box 3310, FI-02015 TKK (Finland); Prince, S J D; Arridge, S [Department of Computer Science, University College London, Gower Street London, WC1E 6B (United Kingdom)], E-mail: petri.hiltunen@tkk.fi, E-mail: s.prince@cs.ucl.ac.uk, E-mail: s.arridge@cs.ucl.ac.uk
2009-11-07
We present a combined classification and reconstruction algorithm for diffuse optical tomography (DOT). DOT is a nonlinear ill-posed inverse problem. Therefore, some regularization is needed. We present a mixture of Gaussians prior, which regularizes the DOT reconstruction step. During each iteration, the parameters of a mixture model are estimated. These associate each reconstructed pixel with one of several classes based on the current estimate of the optical parameters. This classification is exploited to form a new prior distribution to regularize the reconstruction step and update the optical parameters. The algorithm can be described as an iteration between an optimization scheme with zeroth-order variable mean and variance Tikhonov regularization and an expectation-maximization scheme for estimation of the model parameters. We describe the algorithm in a general Bayesian framework. Results from simulated test cases and phantom measurements show that the algorithm enhances the contrast of the reconstructed images with good spatial accuracy. The probabilistic classifications of each image contain only a few misclassified pixels.
Directory of Open Access Journals (Sweden)
Jose H. Guardiola
2010-01-01
Full Text Available This paper compares the academic performance of students in three similar elementary statistics courses taught by the same instructor, but with the lab component differing among the three. One course is traditionally taught without a lab component; the second with a lab component using scenarios and an extensive use of technology, but without explicit coordination between lab and lecture; and the third using a lab component with an extensive use of technology that carefully coordinates the lab with the lecture. Extensive use of technology means, in this context, using Minitab software in the lab section, doing homework and quizzes using MyMathlab ©, and emphasizing interpretation of computer output during lectures. Initially, an online instrument based on Gardner’s multiple intelligences theory, is given to students to try to identify students’ learning styles and intelligence types as covariates. An analysis of covariance is performed in order to compare differences in achievement. In this study there is no attempt to measure difference in student performance across the different treatments. The purpose of this study is to find indications of associations among variables that support the claim that statistics labs could be associated with superior academic achievement in one of these three instructional environments. Also, this study tries to identify individual student characteristics that could be associated with superior academic performance. This study did not find evidence of any individual student characteristics that could be associated with superior achievement. The response variable was computed as percentage of correct answers for the three exams during the semester added together. The results of this study indicate a significant difference across these three different instructional methods, showing significantly higher mean scores for the response variable on students taking the lab component that was carefully coordinated with
Directory of Open Access Journals (Sweden)
Habib Messai
2016-11-01
Full Text Available Background. Olive oils (OOs show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i characterization by specific markers; (ii authentication by fingerprint patterns; and (iii monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors.
Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil
2016-01-01
Background. Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors. PMID:28231172
International Nuclear Information System (INIS)
2005-01-01
For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees
International Nuclear Information System (INIS)
Kapur, G.S.; Sastry, M.I.S.; Jaiswal, A.K.; Sarpal, A.S.
2004-01-01
The present paper describes various classification techniques like cluster analysis, principal component (PC)/factor analysis to classify different types of base stocks. The API classification of base oils (Group I-III) has been compared to a more detailed NMR derived chemical compositional and molecular structural parameters based classification in order to point out the similarities of the base oils in the same group and the differences between the oils placed in different groups. The detailed compositional parameters have been generated using 1 H and 13 C nuclear magnetic resonance (NMR) spectroscopic methods. Further, oxidation stability, measured in terms of rotating bomb oxidation test (RBOT) life, of non-conventional base stocks and their blends with conventional base stocks, has been quantitatively correlated with their 1 H NMR and elemental (sulphur and nitrogen) data with the help of multiple linear regression (MLR) and artificial neural networks (ANN) techniques. The MLR based model developed using NMR and elemental data showed a high correlation between the 'measured' and 'estimated' RBOT values for both training (R=0.859) and validation (R=0.880) data sets. The ANN based model, developed using fewer number of input variables (only 1 H NMR data) also showed high correlation between the 'measured' and 'estimated' RBOT values for training (R=0.881), validation (R=0.860) and test (R=0.955) data sets
Statistical methods for mechanistic model validation: Salt Repository Project
International Nuclear Information System (INIS)
Eggett, D.L.
1988-07-01
As part of the Department of Energy's Salt Repository Program, Pacific Northwest Laboratory (PNL) is studying the emplacement of nuclear waste containers in a salt repository. One objective of the SRP program is to develop an overall waste package component model which adequately describes such phenomena as container corrosion, waste form leaching, spent fuel degradation, etc., which are possible in the salt repository environment. The form of this model will be proposed, based on scientific principles and relevant salt repository conditions with supporting data. The model will be used to predict the future characteristics of the near field environment. This involves several different submodels such as the amount of time it takes a brine solution to contact a canister in the repository, how long it takes a canister to corrode and expose its contents to the brine, the leach rate of the contents of the canister, etc. These submodels are often tested in a laboratory and should be statistically validated (in this context, validate means to demonstrate that the model adequately describes the data) before they can be incorporated into the waste package component model. This report describes statistical methods for validating these models. 13 refs., 1 fig., 3 tabs
Different methods for quark/gluon jet classification on real data from the DELPHI detector
International Nuclear Information System (INIS)
Transtroemer, G.
1999-05-01
Different methods to separate quark jets from gluon jets have been investigated and tested on data from the DELPHI experiment. A test sample of gluon jets was selected from bb-barg threejet events where the two b-jets had been identified using a lifetime tag and quark jet sample was obtained from qq-barγ events where the photon was required to have a high energy and to be well separated from the two jets. Three types of tests were made. Firstly, the jet energy, which is the variable most frequently used for quark/gluon jet separation, was compared with methods based of the differences in the fragmentation of quark and gluon jets. It was found that the fragmentation based classification provides significantly better identification than the jet energy only in events where the jets all have approximately the same energy. In Monte Carlo generated symmetric e + e - → qq-barg threejet events, where the jet energy does not provide any identification at all, the gluon jet was correctly assigned in 58 % of the events. More important, however, is that the identification has been divided into two independent parts, the energy part and the fragmentation part. Secondly, two different sets of fragmentation sensitive variables were tested. It was found that a slightly better identification could be achieved using information from all the particles of the jet rather than using only the leading ones. Thirdly, three types of statistical discrimination methods were compared: a cut on a single fragmentation variable; a cut on the Fisher statistical discriminant calculated from one set of variables; a cut on the output from an Artificial Neural Networks (ANN) trained on different sets of variables. The three types of classifiers gave about the same performance and one conclusion from this study was that the use of ANNs or Fisher statistical discrimination do not seem to improve the results significantly in quark/gluon jet separation on a jet to jet basis
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data
Abusamra, Heba
2013-01-01
Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.
Development and testing of improved statistical wind power forecasting methods.
Energy Technology Data Exchange (ETDEWEB)
Mendes, J.; Bessa, R.J.; Keko, H.; Sumaili, J.; Miranda, V.; Ferreira, C.; Gama, J.; Botterud, A.; Zhou, Z.; Wang, J. (Decision and Information Sciences); (INESC Porto)
2011-12-06
(with spatial and/or temporal dependence). Statistical approaches to uncertainty forecasting basically consist of estimating the uncertainty based on observed forecasting errors. Quantile regression (QR) is currently a commonly used approach in uncertainty forecasting. In Chapter 3, we propose new statistical approaches to the uncertainty estimation problem by employing kernel density forecast (KDF) methods. We use two estimators in both offline and time-adaptive modes, namely, the Nadaraya-Watson (NW) and Quantilecopula (QC) estimators. We conduct detailed tests of the new approaches using QR as a benchmark. One of the major issues in wind power generation are sudden and large changes of wind power output over a short period of time, namely ramping events. In Chapter 4, we perform a comparative study of existing definitions and methodologies for ramp forecasting. We also introduce a new probabilistic method for ramp event detection. The method starts with a stochastic algorithm that generates wind power scenarios, which are passed through a high-pass filter for ramp detection and estimation of the likelihood of ramp events to happen. The report is organized as follows: Chapter 2 presents the results of the application of ITL training criteria to deterministic WPF; Chapter 3 reports the study on probabilistic WPF, including new contributions to wind power uncertainty forecasting; Chapter 4 presents a new method to predict and visualize ramp events, comparing it with state-of-the-art methodologies; Chapter 5 briefly summarizes the main findings and contributions of this report.
Data and statistical methods for analysis of trends and patterns
International Nuclear Information System (INIS)
Atwood, C.L.; Gentillon, C.D.; Wilson, G.E.
1992-11-01
This report summarizes topics considered at a working meeting on data and statistical methods for analysis of trends and patterns in US commercial nuclear power plants. This meeting was sponsored by the Office of Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC). Three data sets are briefly described: Nuclear Plant Reliability Data System (NPRDS), Licensee Event Report (LER) data, and Performance Indicator data. Two types of study are emphasized: screening studies, to see if any trends or patterns appear to be present; and detailed studies, which are more concerned with checking the analysis assumptions, modeling any patterns that are present, and searching for causes. A prescription is given for a screening study, and ideas are suggested for a detailed study, when the data take of any of three forms: counts of events per time, counts of events per demand, and non-event data
International Nuclear Information System (INIS)
2001-01-01
For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
International Nuclear Information System (INIS)
2000-01-01
For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
International Nuclear Information System (INIS)
1999-01-01
For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
Implementation of statistical analysis methods for medical physics data
International Nuclear Information System (INIS)
Teixeira, Marilia S.; Pinto, Nivia G.P.; Barroso, Regina C.; Oliveira, Luis F.
2009-01-01
The objective of biomedical research with different radiation natures is to contribute for the understanding of the basic physics and biochemistry of the biological systems, the disease diagnostic and the development of the therapeutic techniques. The main benefits are: the cure of tumors through the therapy, the anticipated detection of diseases through the diagnostic, the using as prophylactic mean for blood transfusion, etc. Therefore, for the better understanding of the biological interactions occurring after exposure to radiation, it is necessary for the optimization of therapeutic procedures and strategies for reduction of radioinduced effects. The group pf applied physics of the Physics Institute of UERJ have been working in the characterization of biological samples (human tissues, teeth, saliva, soil, plants, sediments, air, water, organic matrixes, ceramics, fossil material, among others) using X-rays diffraction and X-ray fluorescence. The application of these techniques for measurement, analysis and interpretation of the biological tissues characteristics are experimenting considerable interest in the Medical and Environmental Physics. All quantitative data analysis must be initiated with descriptive statistic calculation (means and standard deviations) in order to obtain a previous notion on what the analysis will reveal. It is well known que o high values of standard deviation found in experimental measurements of biologicals samples can be attributed to biological factors, due to the specific characteristics of each individual (age, gender, environment, alimentary habits, etc). This work has the main objective the development of a program for the use of specific statistic methods for the optimization of experimental data an analysis. The specialized programs for this analysis are proprietary, another objective of this work is the implementation of a code which is free and can be shared by the other research groups. As the program developed since the
Soft and hard classification by reproducing kernel Hilbert space methods.
Wahba, Grace
2002-12-24
Reproducing kernel Hilbert space (RKHS) methods provide a unified context for solving a wide variety of statistical modelling and function estimation problems. We consider two such problems: We are given a training set [yi, ti, i = 1, em leader, n], where yi is the response for the ith subject, and ti is a vector of attributes for this subject. The value of y(i) is a label that indicates which category it came from. For the first problem, we wish to build a model from the training set that assigns to each t in an attribute domain of interest an estimate of the probability pj(t) that a (future) subject with attribute vector t is in category j. The second problem is in some sense less ambitious; it is to build a model that assigns to each t a label, which classifies a future subject with that t into one of the categories or possibly "none of the above." The approach to the first of these two problems discussed here is a special case of what is known as penalized likelihood estimation. The approach to the second problem is known as the support vector machine. We also note some alternate but closely related approaches to the second problem. These approaches are all obtained as solutions to optimization problems in RKHS. Many other problems, in particular the solution of ill-posed inverse problems, can be obtained as solutions to optimization problems in RKHS and are mentioned in passing. We caution the reader that although a large literature exists in all of these topics, in this inaugural article we are selectively highlighting work of the author, former students, and other collaborators.
Statistically qualified neuro-analytic failure detection method and system
Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.
2002-03-02
An apparatus and method for monitoring a process involve development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two stages: deterministic model adaption and stochastic model modification of the deterministic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics, augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation error minimization technique. Stochastic model modification involves qualifying any remaining uncertainty in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system. Illustrative of the method and apparatus, the method is applied to a peristaltic pump system.
Improved Statistical Method For Hydrographic Climatic Records Quality Control
Gourrion, J.; Szekely, T.
2016-02-01
Climate research benefits from the continuous development of global in-situ hydrographic networks in the last decades. Apart from the increasing volume of observations available on a large range of temporal and spatial scales, a critical aspect concerns the ability to constantly improve the quality of the datasets. In the context of the Coriolis Dataset for ReAnalysis (CORA) version 4.2, a new quality control method based on a local comparison to historical extreme values ever observed is developed, implemented and validated. Temperature, salinity and potential density validity intervals are directly estimated from minimum and maximum values from an historical reference dataset, rather than from traditional mean and standard deviation estimates. Such an approach avoids strong statistical assumptions on the data distributions such as unimodality, absence of skewness and spatially homogeneous kurtosis. As a new feature, it also allows addressing simultaneously the two main objectives of a quality control strategy, i.e. maximizing the number of good detections while minimizing the number of false alarms. The reference dataset is presently built from the fusion of 1) all ARGO profiles up to early 2014, 2) 3 historical CTD datasets and 3) the Sea Mammals CTD profiles from the MEOP database. All datasets are extensively and manually quality controlled. In this communication, the latest method validation results are also presented. The method has been implemented in the latest version of the CORA dataset and will benefit to the next version of the Copernicus CMEMS dataset.
A Spectral-Texture Kernel-Based Classification Method for Hyperspectral Images
Directory of Open Access Journals (Sweden)
Yi Wang
2016-11-01
Full Text Available Classification of hyperspectral images always suffers from high dimensionality and very limited labeled samples. Recently, the spectral-spatial classification has attracted considerable attention and can achieve higher classification accuracy and smoother classification maps. In this paper, a novel spectral-spatial classification method for hyperspectral images by using kernel methods is investigated. For a given hyperspectral image, the principle component analysis (PCA transform is first performed. Then, the first principle component of the input image is segmented into non-overlapping homogeneous regions by using the entropy rate superpixel (ERS algorithm. Next, the local spectral histogram model is applied to each homogeneous region to obtain the corresponding texture features. Because this step is performed within each homogenous region, instead of within a fixed-size image window, the obtained local texture features in the image are more accurate, which can effectively benefit the improvement of classification accuracy. In the following step, a contextual spectral-texture kernel is constructed by combining spectral information in the image and the extracted texture information using the linearity property of the kernel methods. Finally, the classification map is achieved by the support vector machines (SVM classifier using the proposed spectral-texture kernel. Experiments on two benchmark airborne hyperspectral datasets demonstrate that our method can effectively improve classification accuracies, even though only a very limited training sample is available. Specifically, our method can achieve from 8.26% to 15.1% higher in terms of overall accuracy than the traditional SVM classifier. The performance of our method was further compared to several state-of-the-art classification methods of hyperspectral images using objective quantitative measures and a visual qualitative evaluation.
A new method to determine the number of experimental data using statistical modeling methods
Energy Technology Data Exchange (ETDEWEB)
Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)
2017-06-15
For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.
Assessment Methods in Statistical Education An International Perspective
Bidgood, Penelope; Jolliffe, Flavia
2010-01-01
This book is a collaboration from leading figures in statistical education and is designed primarily for academic audiences involved in teaching statistics and mathematics. The book is divided in four sections: (1) Assessment using real-world problems, (2) Assessment statistical thinking, (3) Individual assessment (4) Successful assessment strategies.
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Verhoeven, P.S.
2009-01-01
Although Statistics is not a very popular course according to most students, a majority of students still take it, as it is mandatory at most Social Science departments. Therefore it takes special teacher’s skills to teach statistics. In order to do so it is essential for teachers to know what
International Nuclear Information System (INIS)
Saedtler, E.
1981-01-01
The dissertation discusses: 1. Approximative filter algorithms for identification of systems and hierarchical structures. 2. Adaptive statistical pattern recognition and classification. 3. Parameter selection, extraction, and modelling for an automatic control system. 4. Design of a decision tree and an adaptive diagnostic system. (orig./RW) [de
International Nuclear Information System (INIS)
2003-01-01
For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products
International Nuclear Information System (INIS)
2004-01-01
For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees
International Nuclear Information System (INIS)
2000-01-01
For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products
Statistical method to compare massive parallel sequencing pipelines.
Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P
2017-03-01
Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.
Directory of Open Access Journals (Sweden)
Rui Guo
2018-03-01
Full Text Available Considering the classification of high spatial resolution remote sensing imagery, this paper presents a novel classification method for such imagery using deep neural networks. Deep learning methods, such as a fully convolutional network (FCN model, achieve state-of-the-art performance in natural image semantic segmentation when provided with large-scale datasets and respective labels. To use data efficiently in the training stage, we first pre-segment training images and their labels into small patches as supplements of training data using graph-based segmentation and the selective search method. Subsequently, FCN with atrous convolution is used to perform pixel-wise classification. In the testing stage, post-processing with fully connected conditional random fields (CRFs is used to refine results. Extensive experiments based on the Vaihingen dataset demonstrate that our method performs better than the reference state-of-the-art networks when applied to high-resolution remote sensing imagery classification.
Comparative analysis of methods for classification in predicting the quality of bread
E. A. Balashova; V. K. Bitjukov; E. A. Savvina
2013-01-01
The comparative analysis of classification methods of two-stage cluster and discriminant analysis and neural networks was performed. System of informative signs which classifies with a minimum of errors has been proposed.
Comparative analysis of methods for classification in predicting the quality of bread
Directory of Open Access Journals (Sweden)
E. A. Balashova
2013-01-01
Full Text Available The comparative analysis of classification methods of two-stage cluster and discriminant analysis and neural networks was performed. System of informative signs which classifies with a minimum of errors has been proposed.
Slip estimation methods for proprioceptive terrain classification using tracked mobile robots
CSIR Research Space (South Africa)
Masha, Ditebogo F
2017-11-01
Full Text Available Recent work has shown that proprioceptive measurements such as terrain slip can be used for terrain classification. This paper investigates the suitability of four simple slip estimation methods for differentiating between indoor and outdoor terrain...
Directory of Open Access Journals (Sweden)
Fan Wu
2015-05-01
Full Text Available Vessel monitoring is one of the most important maritime applications of Synthetic Aperture Radar (SAR data. Because of the dihedral reflections between the vessel hull and sea surface and the trihedral reflections among superstructures, vessels usually have strong backscattering in SAR images. Furthermore, in high-resolution SAR images, detailed information on vessel structures can be observed, allowing for vessel classification in high-resolution SAR images. This paper focuses on the feature analysis of merchant vessels, including bulk carriers, container ships and oil tankers, in 3 m resolution COSMO-SkyMed stripmap HIMAGE mode images and proposes a method for vessel classification. After preprocessing, a feature vector is estimated by calculating the average value of the kernel density estimation, three structural features and the mean backscattering coefficient. Support vector machine (SVM classifier is used for the vessel classification, and the results are compared with traditional methods, such as the K-nearest neighbor algorithm (K-NN and minimum distance classifier (MDC. In situ investigations are conducted during the SAR data acquisition. Corresponding Automatic Identification System (AIS reports are also obtained as ground truth to evaluate the effectiveness of the classifier. The preliminary results show that the combination of the average value of the kernel density estimation and mean backscattering coefficient has good ability for classifying the three types of vessels. When adding the three structural features, the results slightly improve. The result of the SVM classifier is better than that of K-NN and MDC. However, the SVM requires more time, when the parameters of the kernel are estimated.
Tahir, Fahima; Fahiem, Muhammad Abuzar
2014-01-01
The quality of pharmaceutical products plays an important role in pharmaceutical industry as well as in our lives. Usage of defective tablets can be harmful for patients. In this research we proposed a nondestructive method to identify defective and nondefective tablets using their surface morphology. Three different environmental factors temperature, humidity and moisture are analyzed to evaluate the performance of the proposed method. Multiple textural features are extracted from the surface of the defective and nondefective tablets. These textural features are gray level cooccurrence matrix, run length matrix, histogram, autoregressive model and HAAR wavelet. Total textural features extracted from images are 281. We performed an analysis on all those 281, top 15, and top 2 features. Top 15 features are extracted using three different feature reduction techniques: chi-square, gain ratio and relief-F. In this research we have used three different classifiers: support vector machine, K-nearest neighbors and naïve Bayes to calculate the accuracies against proposed method using two experiments, that is, leave-one-out cross-validation technique and train test models. We tested each classifier against all selected features and then performed the comparison of their results. The experimental work resulted in that in most of the cases SVM performed better than the other two classifiers.
Hydrologic extremes - an intercomparison of multiple gridded statistical downscaling methods
Werner, Arelia T.; Cannon, Alex J.
2016-04-01
Gridded statistical downscaling methods are the main means of preparing climate model data to drive distributed hydrological models. Past work on the validation of climate downscaling methods has focused on temperature and precipitation, with less attention paid to the ultimate outputs from hydrological models. Also, as attention shifts towards projections of extreme events, downscaling comparisons now commonly assess methods in terms of climate extremes, but hydrologic extremes are less well explored. Here, we test the ability of gridded downscaling models to replicate historical properties of climate and hydrologic extremes, as measured in terms of temporal sequencing (i.e. correlation tests) and distributional properties (i.e. tests for equality of probability distributions). Outputs from seven downscaling methods - bias correction constructed analogues (BCCA), double BCCA (DBCCA), BCCA with quantile mapping reordering (BCCAQ), bias correction spatial disaggregation (BCSD), BCSD using minimum/maximum temperature (BCSDX), the climate imprint delta method (CI), and bias corrected CI (BCCI) - are used to drive the Variable Infiltration Capacity (VIC) model over the snow-dominated Peace River basin, British Columbia. Outputs are tested using split-sample validation on 26 climate extremes indices (ClimDEX) and two hydrologic extremes indices (3-day peak flow and 7-day peak flow). To characterize observational uncertainty, four atmospheric reanalyses are used as climate model surrogates and two gridded observational data sets are used as downscaling target data. The skill of the downscaling methods generally depended on reanalysis and gridded observational data set. However, CI failed to reproduce the distribution and BCSD and BCSDX the timing of winter 7-day low-flow events, regardless of reanalysis or observational data set. Overall, DBCCA passed the greatest number of tests for the ClimDEX indices, while BCCAQ, which is designed to more accurately resolve event
National Research Council Canada - National Science Library
Willsky, Alan
2004-01-01
.... Our research blends methods from several fields-statistics and probability, signal and image processing, mathematical physics, scientific computing, statistical learning theory, and differential...
Improved statistical method for temperature and salinity quality control
Gourrion, Jérôme; Szekely, Tanguy
2017-04-01
Climate research and Ocean monitoring benefit from the continuous development of global in-situ hydrographic networks in the last decades. Apart from the increasing volume of observations available on a large range of temporal and spatial scales, a critical aspect concerns the ability to constantly improve the quality of the datasets. In the context of the Coriolis Dataset for ReAnalysis (CORA) version 4.2, a new quality control method based on a local comparison to historical extreme values ever observed is developed, implemented and validated. Temperature, salinity and potential density validity intervals are directly estimated from minimum and maximum values from an historical reference dataset, rather than from traditional mean and standard deviation estimates. Such an approach avoids strong statistical assumptions on the data distributions such as unimodality, absence of skewness and spatially homogeneous kurtosis. As a new feature, it also allows addressing simultaneously the two main objectives of an automatic quality control strategy, i.e. maximizing the number of good detections while minimizing the number of false alarms. The reference dataset is presently built from the fusion of 1) all ARGO profiles up to late 2015, 2) 3 historical CTD datasets and 3) the Sea Mammals CTD profiles from the MEOP database. All datasets are extensively and manually quality controlled. In this communication, the latest method validation results are also presented. The method has already been implemented in the latest version of the delayed-time CMEMS in-situ dataset and will be deployed soon in the equivalent near-real time products.
Tahir, Muhammad; Jan, Bismillah; Hayat, Maqsood; Shah, Shakir Ullah; Amin, Muhammad
2018-04-01
Discriminative and informative feature extraction is the core requirement for accurate and efficient classification of protein subcellular localization images so that drug development could be more effective. The objective of this paper is to propose a novel modification in the Threshold Adjacency Statistics technique and enhance its discriminative power. In this work, we utilized Threshold Adjacency Statistics from a novel perspective to enhance its discrimination power and efficiency. In this connection, we utilized seven threshold ranges to produce seven distinct feature spaces, which are then used to train seven SVMs. The final prediction is obtained through the majority voting scheme. The proposed ETAS-SubLoc system is tested on two benchmark datasets using 5-fold cross-validation technique. We observed that our proposed novel utilization of TAS technique has improved the discriminative power of the classifier. The ETAS-SubLoc system has achieved 99.2% accuracy, 99.3% sensitivity and 99.1% specificity for Endogenous dataset outperforming the classical Threshold Adjacency Statistics technique. Similarly, 91.8% accuracy, 96.3% sensitivity and 91.6% specificity values are achieved for Transfected dataset. Simulation results validated the effectiveness of ETAS-SubLoc that provides superior prediction performance compared to the existing technique. The proposed methodology aims at providing support to pharmaceutical industry as well as research community towards better drug designing and innovation in the fields of bioinformatics and computational biology. The implementation code for replicating the experiments presented in this paper is available at: https://drive.google.com/file/d/0B7IyGPObWbSqRTRMcXI2bG5CZWs/view?usp=sharing. Copyright © 2018 Elsevier B.V. All rights reserved.
Statistical error estimation of the Feynman-α method using the bootstrap method
International Nuclear Information System (INIS)
Endo, Tomohiro; Yamamoto, Akio; Yagi, Takahiro; Pyeon, Cheol Ho
2016-01-01
Applicability of the bootstrap method is investigated to estimate the statistical error of the Feynman-α method, which is one of the subcritical measurement techniques on the basis of reactor noise analysis. In the Feynman-α method, the statistical error can be simply estimated from multiple measurements of reactor noise, however it requires additional measurement time to repeat the multiple times of measurements. Using a resampling technique called 'bootstrap method' standard deviation and confidence interval of measurement results obtained by the Feynman-α method can be estimated as the statistical error, using only a single measurement of reactor noise. In order to validate our proposed technique, we carried out a passive measurement of reactor noise without any external source, i.e. with only inherent neutron source by spontaneous fission and (α,n) reactions in nuclear fuels at the Kyoto University Criticality Assembly. Through the actual measurement, it is confirmed that the bootstrap method is applicable to approximately estimate the statistical error of measurement results obtained by the Feynman-α method. (author)
Directory of Open Access Journals (Sweden)
Hongzhuan Zhao
2016-04-01
Full Text Available With the enrichment of perception methods, modern transportation system has many physical objects whose states are influenced by many information factors so that it is a typical Cyber-Physical System (CPS. Thus, the traffic information is generally multi-sourced, heterogeneous and hierarchical. Existing research results show that the multisourced traffic information through accurate classification in the process of information fusion can achieve better parameters forecasting performance. For solving the problem of traffic information accurate classification, via analysing the characteristics of the multi-sourced traffic information and using redefined binary tree to overcome the shortcomings of the original Support Vector Machine (SVM classification in information fusion, a multi-classification method using improved SVM in information fusion for traffic parameters forecasting is proposed. The experiment was conducted to examine the performance of the proposed scheme, and the results reveal that the method can get more accurate and practical outcomes.
A statistical method for draft tube pressure pulsation analysis
International Nuclear Information System (INIS)
Doerfler, P K; Ruchonnet, N
2012-01-01
Draft tube pressure pulsation (DTPP) in Francis turbines is composed of various components originating from different physical phenomena. These components may be separated because they differ by their spatial relationships and by their propagation mechanism. The first step for such an analysis was to distinguish between so-called synchronous and asynchronous pulsations; only approximately periodic phenomena could be described in this manner. However, less regular pulsations are always present, and these become important when turbines have to operate in the far off-design range, in particular at very low load. The statistical method described here permits to separate the stochastic (random) component from the two traditional 'regular' components. It works in connection with the standard technique of model testing with several pressure signals measured in draft tube cone. The difference between the individual signals and the averaged pressure signal, together with the coherence between the individual pressure signals is used for analysis. An example reveals that a generalized, non-periodic version of the asynchronous pulsation is important at low load.
Directory of Open Access Journals (Sweden)
Guizhou Wang
2013-01-01
Full Text Available This paper presents a new classification method for high-spatial-resolution remote sensing images based on a strategic mechanism of spatial mapping and reclassification. The proposed method includes four steps. First, the multispectral image is classified by a traditional pixel-based classification method (support vector machine. Second, the panchromatic image is subdivided by watershed segmentation. Third, the pixel-based multispectral image classification result is mapped to the panchromatic segmentation result based on a spatial mapping mechanism and the area dominant principle. During the mapping process, an area proportion threshold is set, and the regional property is defined as unclassified if the maximum area proportion does not surpass the threshold. Finally, unclassified regions are reclassified based on spectral information using the minimum distance to mean algorithm. Experimental results show that the classification method for high-spatial-resolution remote sensing images based on the spatial mapping mechanism and reclassification strategy can make use of both panchromatic and multispectral information, integrate the pixel- and object-based classification methods, and improve classification accuracy.
International Nuclear Information System (INIS)
Kounelakis, M G; Zervakis, M E; Giakos, G C; Postma, G J; Buydens, L M C; Kotsiakis, X
2011-01-01
The purpose of this study is to identify reliable sets of metabolic markers that provide accurate classification of complex brain tumors and facilitate the process of clinical diagnosis. Several ratios of metabolites are tested alone or in combination with imaging markers. A wrapper feature selection and classification methodology is studied, employing Fisher's criterion for ranking the markers. The set of extracted markers that express statistical significance is further studied in terms of biological behavior with respect to the brain tumor type and grade. The outcome of this study indicates that the proposed method by exploiting the intrinsic properties of data can actually reveal reliable and biologically relevant sets of metabolic markers, which form an important adjunct toward a more accurate type and grade discrimination of complex brain tumors
Information Geometry, Inference Methods and Chaotic Energy Levels Statistics
Cafaro, Carlo
2008-01-01
In this Letter, we propose a novel information-geometric characterization of chaotic (integrable) energy level statistics of a quantum antiferromagnetic Ising spin chain in a tilted (transverse) external magnetic field. Finally, we conjecture our results might find some potential physical applications in quantum energy level statistics.
Statistical methods for decision making in mine action
DEFF Research Database (Denmark)
Larsen, Jan
The lecture discusses the basics of statistical decision making in connection with humanitarian mine action. There is special focus on: 1) requirements for mine detection; 2) design and evaluation of mine equipment; 3) performance improvement by statistical learning and information fusion; 4...
Statistics a guide to the use of statistical methods in the physical sciences
Barlow, Roger J
1989-01-01
The Manchester Physics Series General Editors: D. J. Sandiford; F. Mandl; A. C. Phillips Department of Physics and Astronomy, University of Manchester Properties of Matter B. H. Flowers and E. Mendoza Optics Second Edition F. G. Smith and J. H. Thomson Statistical Physics Second Edition F. Mandl Electromagnetism Second Edition I. S. Grant and W. R. Phillips Statistics R. J. Barlow Solid State Physics Second Edition J. R. Hook and H. E. Hall Quantum Mechanics F. Mandl Particle Physics Second Edition B. R. Martin and G. Shaw The Physics of Stars Second Edition A.C. Phillips Computing for Scienti
Stolzer, Alan J.; Halford, Carl
2007-01-01
In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
A Two-Layer Method for Sedentary Behaviors Classification Using Smartphone and Bluetooth Beacons.
Cerón, Jesús D; López, Diego M; Hofmann, Christian
2017-01-01
Among the factors that outline the health of populations, person's lifestyle is the more important one. This work focuses on the caracterization and prevention of sedentary lifestyles. A sedentary behavior is defined as "any waking behavior characterized by an energy expenditure of 1.5 METs (Metabolic Equivalent) or less while in a sitting or reclining posture". To propose a method for sedentary behaviors classification using a smartphone and Bluetooth beacons considering different types of classification models: personal, hybrid or impersonal. Following the CRISP-DM methodology, a method based on a two-layer approach for the classification of sedentary behaviors is proposed. Using data collected from a smartphones' accelerometer, gyroscope and barometer; the first layer classifies between performing a sedentary behavior and not. The second layer of the method classifies the specific sedentary activity performed using only the smartphone's accelerometer and barometer data, but adding indoor location data, using Bluetooth Low Energy (BLE) beacons. To improve the precision of the classification, both layers implemented the Random Forest algorithm and the personal model. This study presents the first available method for the automatic classification of specific sedentary behaviors. The layered classification approach has the potential to improve processing, memory and energy consumption of mobile devices and wearables used.
Multi-class Mode of Action Classification of Toxic Compounds Using Logic Based Kernel Methods.
Lodhi, Huma; Muggleton, Stephen; Sternberg, Mike J E
2010-09-17
Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Directory of Open Access Journals (Sweden)
R. Valbuena
2016-02-01
Full Text Available The area-based method has become a widespread approach in airborne laser scanning (ALS, being mainly employed for the estimation of continuous variables describing forest attributes: biomass, volume, density, etc. However, to date, classification methods based on machine learning, which are fairly common in other remote sensing fields, such as land use / land cover classification using multispectral sensors, have been largely overseen in forestry applications of ALS. In this article, we wish to draw the attention on statistical methods predicting discrete responses, for supervised classification of ALS datasets. A wide spectrum of approaches are reviewed: discriminant analysis (DA using various classifiers –maximum likelihood, minimum volume ellipsoid, naïve Bayes–, support vector machine (SVM, artificial neural networks (ANN, random forest (RF and nearest neighbour (NN methods. They are compared in the context of a classification of forest areas into development classes (DC used in practical silvicultural management in Finland, using their low-density national ALS dataset. We observed that RF and NN had the most balanced error matrices, with cross-validated predictions which were mainly unbiased for all DCs. Although overall accuracies were higher for SVM and ANN, their results were very dissimilar across DCs, and they can therefore be only advantageous if certain DCs are targeted. DA methods underperformed in comparison to other alternatives, and were only advantageous for the detection of seedling stands. These results show that, besides the well demonstrated capacity of ALS for quantifying forest stocks, there is a great deal of potential for predicting categorical variables in general, and forest types in particular. In conclusion, we consider that the presented methodology shall also be adapted to the type of forest classes that can be relevant to Mediterranean ecosystems, opening a range of possibilities for future research, in which
A change detection experiment for an invasive species, saltcedar, near Lovelock, Nevada, was conducted with multi-date Compact Airborne Spectrographic Imager (CASI) hyperspectral datasets. Classification and NDVI differencing change detection methods were tested, In the classification strategy, a p...
Robust Control Methods for On-Line Statistical Learning
Directory of Open Access Journals (Sweden)
Capobianco Enrico
2001-01-01
Full Text Available The issue of controlling that data processing in an experiment results not affected by the presence of outliers is relevant for statistical control and learning studies. Learning schemes should thus be tested for their capacity of handling outliers in the observed training set so to achieve reliable estimates with respect to the crucial bias and variance aspects. We describe possible ways of endowing neural networks with statistically robust properties by defining feasible error criteria. It is convenient to cast neural nets in state space representations and apply both Kalman filter and stochastic approximation procedures in order to suggest statistically robustified solutions for on-line learning.
An NCME Instructional Module on Data Mining Methods for Classification and Regression
Sinharay, Sandip
2016-01-01
Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
Classification of methods for annual energy harvesting calculations of photovoltaic generators
International Nuclear Information System (INIS)
Rus-Casas, C.; Aguilar, J.D.; Rodrigo, P.; Almonacid, F.; Pérez-Higueras, P.J.
2014-01-01
Highlights: • The paper presents a novel classification of methods for annual energy harvesting calculation of grid-connected PV systems. • The methods are classified in direct and indirect methods. • Direct methods directly calculate the energy. Indirect methods calculate the energy from the power. • The classification can help the PV professionals in order to choose the most suitable method for each application. - Abstract: Estimating the energy provided by the generators of grid-connected photovoltaic systems is important in order to analyze their economic viability and supervise their operation. The energy harvesting calculation of a photovoltaic generator is not trivial; there are a lot of methods for this calculation. The aim of this paper is to develop a novel classification of methods for annual energy harvesting calculation of a generator of a grid-connected photovoltaic system. The methods are classified in two groups: (1) those that indirectly calculate the energy, i.e. they first calculate the power and from this, they calculate the energy, and (2) those that directly calculate the energy. Furthermore, the indirect methods are grouped in two categories: those that first calculate the I–V curve of the generator and from this, they calculate the power, and those that directly calculate the power. The study has shown that the existing methods differ in simplicity and accuracy, so that the proposed classification is useful in order to choose the most suitable method for each specific application
Statistics and scientific method: an introduction for students and researchers
National Research Council Canada - National Science Library
Diggle, Peter; Chetwynd, Amanda
2011-01-01
"Most introductory statistics text-books are written either in a highly mathematical style for an intended readership of mathematics undergraduate students, or in a recipe-book style for an intended...
Using Statistical Process Control Methods to Classify Pilot Mental Workloads
National Research Council Canada - National Science Library
Kudo, Terence
2001-01-01
.... These include cardiac, ocular, respiratory, and brain activity measures. The focus of this effort is to apply statistical process control methodology on different psychophysiological features in an attempt to classify pilot mental workload...
Sensory evaluation of food: statistical methods and procedures
National Research Council Canada - National Science Library
O'Mahony, Michael
1986-01-01
The aim of this book is to provide basic knowledge of the logic and computation of statistics for the sensory evaluation of food, or for other forms of sensory measurement encountered in, say, psychophysics...
Statistical Sensitive Data Protection and Inference Prevention with Decision Tree Methods
National Research Council Canada - National Science Library
Chang, LiWu
2003-01-01
.... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...
An object-oriented classification method of high resolution imagery based on improved AdaTree
International Nuclear Information System (INIS)
Xiaohe, Zhang; Liang, Zhai; Jixian, Zhang; Huiyong, Sang
2014-01-01
With the popularity of the application using high spatial resolution remote sensing image, more and more studies paid attention to object-oriented classification on image segmentation as well as automatic classification after image segmentation. This paper proposed a fast method of object-oriented automatic classification. First, edge-based or FNEA-based segmentation was used to identify image objects and the values of most suitable attributes of image objects for classification were calculated. Then a certain number of samples from the image objects were selected as training data for improved AdaTree algorithm to get classification rules. Finally, the image objects could be classified easily using these rules. In the AdaTree, we mainly modified the final hypothesis to get classification rules. In the experiment with WorldView2 image, the result of the method based on AdaTree showed obvious accuracy and efficient improvement compared with the method based on SVM with the kappa coefficient achieving 0.9242
HEp-2 cell image classification method based on very deep convolutional networks with small datasets
Lu, Mengchi; Gao, Long; Guo, Xifeng; Liu, Qiang; Yin, Jianping
2017-07-01
Human Epithelial-2 (HEp-2) cell images staining patterns classification have been widely used to identify autoimmune diseases by the anti-Nuclear antibodies (ANA) test in the Indirect Immunofluorescence (IIF) protocol. Because manual test is time consuming, subjective and labor intensive, image-based Computer Aided Diagnosis (CAD) systems for HEp-2 cell classification are developing. However, methods proposed recently are mostly manual features extraction with low accuracy. Besides, the scale of available benchmark datasets is small, which does not exactly suitable for using deep learning methods. This issue will influence the accuracy of cell classification directly even after data augmentation. To address these issues, this paper presents a high accuracy automatic HEp-2 cell classification method with small datasets, by utilizing very deep convolutional networks (VGGNet). Specifically, the proposed method consists of three main phases, namely image preprocessing, feature extraction and classification. Moreover, an improved VGGNet is presented to address the challenges of small-scale datasets. Experimental results over two benchmark datasets demonstrate that the proposed method achieves superior performance in terms of accuracy compared with existing methods.
Colon-Berlingeri, Migdalisel; Burrowes, Patricia A
2011-01-01
Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math-biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology.
Ing, Alex; Schwarzbauer, Christian
2014-01-01
Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.
Classification of Actions or Inheritance also for Methods
DEFF Research Database (Denmark)
Kristensen, Bent Bruun; Madsen, Ole Lehrmann; Møller-Pedersen, Birger
1987-01-01
-class are specialized in sub-classes in a very primitive manner: they are simply re-defined and need not bear any resemblance of the virtual in the super-class. In BETA, a new object-oriented language, classes and methods are unified into one concept, and by an extension of the virtual concept, virtual procedures....../methods in sub-classes are defined as specializations of the virtuals in the super-class. The virtual procedures/methods of the sub-classes thus inherits the attributes (e.g. parameters) and actions from the “super-procedure/method”. In the languages mentioned above only procedures/methods may be virtual...
Statistical methods and applications from a historical perspective selected issues
Mignani, Stefania
2014-01-01
The book showcases a selection of peer-reviewed papers, the preliminary versions of which were presented at a conference held 11-13 June 2011 in Bologna and organized jointly by the Italian Statistical Society (SIS), the National Institute of Statistics (ISTAT) and the Bank of Italy. The theme of the conference was "Statistics in the 150 years of the Unification of Italy." The celebration of the anniversary of Italian unification provided the opportunity to examine and discuss the methodological aspects and applications from a historical perspective and both from a national and international point of view. The critical discussion on the issues of the past has made it possible to focus on recent advances, considering the studies of socio-economic and demographic changes in European countries.
Statistical methods to evaluate thermoluminescence ionizing radiation dosimetry data
International Nuclear Information System (INIS)
Segre, Nadia; Matoso, Erika; Fagundes, Rosane Correa
2011-01-01
Ionizing radiation levels, evaluated through the exposure of CaF 2 :Dy thermoluminescence dosimeters (TLD- 200), have been monitored at Centro Experimental Aramar (CEA), located at Ipero in Sao Paulo state, Brazil, since 1991 resulting in a large amount of measurements until 2009 (more than 2,000). The data amount associated with measurements dispersion, since every process has deviation, reinforces the utilization of statistical tools to evaluate the results, procedure also imposed by the Brazilian Standard CNEN-NN-3.01/PR- 3.01-008 which regulates the radiometric environmental monitoring. Thermoluminescence ionizing radiation dosimetry data are statistically compared in order to evaluate potential CEA's activities environmental impact. The statistical tools discussed in this work are box plots, control charts and analysis of variance. (author)
Statistical methods for quantitative mass spectrometry proteomic experiments with labeling
Directory of Open Access Journals (Sweden)
Oberg Ann L
2012-11-01
Full Text Available Abstract Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.
Statistical methods for quantitative mass spectrometry proteomic experiments with labeling.
Oberg, Ann L; Mahoney, Douglas W
2012-01-01
Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.
Adaptive Maneuvering Frequency Method of Current Statistical Model
Institute of Scientific and Technical Information of China (English)
Wei Sun; Yongjian Yang
2017-01-01
Current statistical model(CSM) has a good performance in maneuvering target tracking. However, the fixed maneuvering frequency will deteriorate the tracking results, such as a serious dynamic delay, a slowly converging speedy and a limited precision when using Kalman filter(KF) algorithm. In this study, a new current statistical model and a new Kalman filter are proposed to improve the performance of maneuvering target tracking. The new model which employs innovation dominated subjection function to adaptively adjust maneuvering frequency has a better performance in step maneuvering target tracking, while a fluctuant phenomenon appears. As far as this problem is concerned, a new adaptive fading Kalman filter is proposed as well. In the new Kalman filter, the prediction values are amended in time by setting judgment and amendment rules,so that tracking precision and fluctuant phenomenon of the new current statistical model are improved. The results of simulation indicate the effectiveness of the new algorithm and the practical guiding significance.
Faults Classification Of Power Electronic Circuits Based On A Support Vector Data Description Method
Directory of Open Access Journals (Sweden)
Cui Jiang
2015-06-01
Full Text Available Power electronic circuits (PECs are prone to various failures, whose classification is of paramount importance. This paper presents a data-driven based fault diagnosis technique, which employs a support vector data description (SVDD method to perform fault classification of PECs. In the presented method, fault signals (e.g. currents, voltages, etc. are collected from accessible nodes of circuits, and then signal processing techniques (e.g. Fourier analysis, wavelet transform, etc. are adopted to extract feature samples, which are subsequently used to perform offline machine learning. Finally, the SVDD classifier is used to implement fault classification task. However, in some cases, the conventional SVDD cannot achieve good classification performance, because this classifier may generate some so-called refusal areas (RAs, and in our design these RAs are resolved with the one-against-one support vector machine (SVM classifier. The obtained experiment results from simulated and actual circuits demonstrate that the improved SVDD has a classification performance close to the conventional one-against-one SVM, and can be applied to fault classification of PECs in practice.
Different methods for quark/gluon jet classification on real data from the DELPHI detector
Energy Technology Data Exchange (ETDEWEB)
Transtroemer, G
1999-05-01
Different methods to separate quark jets from gluon jets have been investigated and tested on data from the DELPHI experiment. A test sample of gluon jets was selected from bb-barg threejet events where the two b-jets had been identified using a lifetime tag and quark jet sample was obtained from qq-bar{gamma} events where the photon was required to have a high energy and to be well separated from the two jets. Three types of tests were made. Firstly, the jet energy, which is the variable most frequently used for quark/gluon jet separation, was compared with methods based of the differences in the fragmentation of quark and gluon jets. It was found that the fragmentation based classification provides significantly better identification than the jet energy only in events where the jets all have approximately the same energy. In Monte Carlo generated symmetric e{sup +}e{sup -} {yields} qq-barg threejet events, where the jet energy does not provide any identification at all, the gluon jet was correctly assigned in 58 % of the events. More important, however, is that the identification has been divided into two independent parts, the energy part and the fragmentation part. Secondly, two different sets of fragmentation sensitive variables were tested. It was found that a slightly better identification could be achieved using information from all the particles of the jet rather than using only the leading ones. Thirdly, three types of statistical discrimination methods were compared: a cut on a single fragmentation variable; a cut on the Fisher statistical discriminant calculated from one set of variables; a cut on the output from an Artificial Neural Networks (ANN) trained on different sets of variables. The three types of classifiers gave about the same performance and one conclusion from this study was that the use of ANNs or Fisher statistical discrimination do not seem to improve the results significantly in quark/gluon jet separation on a jet to jet basis 45 refs
Possible classification of the methods of operational research applicable in the field of defense
Directory of Open Access Journals (Sweden)
Mučibabić Spasoje
2006-01-01
Full Text Available The overall dynamic development of operational research in various fields of human activities urges the need for a clearer and mathematically more explicit classification of its methods. This need is also very urgent in the field of defense, particularly because of the complications of modern conflicts, as well as of new security requirements. One of the possible classifications of methods based on the theory of games as a mathematical model for solving conflict situations is presented in this paper. The connections between methods and their mathematical description are underlined.
Data Processing And Machine Learning Methods For Multi-Modal Operator State Classification Systems
Hearn, Tristan A.
2015-01-01
This document is intended as an introduction to a set of common signal processing learning methods that may be used in the software portion of a functional crew state monitoring system. This includes overviews of both the theory of the methods involved, as well as examples of implementation. Practical considerations are discussed for implementing modular, flexible, and scalable processing and classification software for a multi-modal, multi-channel monitoring system. Example source code is also given for all of the discussed processing and classification methods.
Evaluating Method Engineer Performance: an error classification and preliminary empirical study
Directory of Open Access Journals (Sweden)
Steven Kelly
1998-11-01
Full Text Available We describe an approach to empirically test the use of metaCASE environments to model methods. Both diagrams and matrices have been proposed as a means for presenting the methods. These different paradigms may have their own effects on how easily and well users can model methods. We extend Batra's classification of errors in data modelling to cover metamodelling, and use it to measure the performance of a group of metamodellers using either diagrams or matrices. The tentative results from this pilot study confirm the usefulness of the classification, and show some interesting differences between the paradigms.
Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom
2016-04-01
Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hayen, Andrew; Macaskill, Petra; Irwig, Les; Bossuyt, Patrick
2010-01-01
To explain which measures of accuracy and which statistical methods should be used in studies to assess the value of a new binary test as a replacement test, an add-on test, or a triage test. Selection and explanation of statistical methods, illustrated with examples. Statistical methods for
Barron, Kenneth E.; Apple, Kevin J.
2014-01-01
Coursework in statistics and research methods is a core requirement in most undergraduate psychology programs. However, is there an optimal way to structure and sequence methodology courses to facilitate student learning? For example, should statistics be required before research methods, should research methods be required before statistics, or…
Fuzzy comprehensive evaluation method of F statistics weighting in ...
African Journals Online (AJOL)
In order to rapidly identify the source of water inrush in coal mine, and provide the theoretical basis for mine water damage prevention and control, fuzzy comprehensive evaluation model was established. The F statistics of water samples was normalized as the weight of fuzzy comprehensive evaluation for determining the ...
Statistical methods for decision making in mine action
DEFF Research Database (Denmark)
Larsen, Jan
The design and evaluation of mine clearance equipment – the problem of reliability * Detection probability – tossing a coin * Requirements in mine action * Detection probability and confidence in MA * Using statistics in area reduction Improving performance by information fusion and combination...
Statistical methods of combining information: Applications to sensor data fusion
Energy Technology Data Exchange (ETDEWEB)
Burr, T.
1996-12-31
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
An Introduction to Modern Statistical Methods in HCI
Robertson, Judy; Kaptein, Maurits; Robertson, J; Kaptein, M
2016-01-01
This chapter explains why we think statistical methodology matters so much to the HCI community and why we should attempt to improve it. It introduces some flaws in the well-accepted methodology of Null Hypothesis Significance Testing and briefly introduces some alternatives. Throughout the book we
An introduction to modern statistical methods in HCI
Robertson, J.; Kaptein, M.C.; Robertson, J.; Kaptein, M.C.
2016-01-01
This chapter explains why we think statistical methodology matters so much to the HCI community and why we should attempt to improve it. It introduces some flaws in the well-accepted methodology of Null Hypothesis Significance Testing and briefly introduces some alternatives. Throughout the book we
Effective viscosity of dispersions approached by a statistical continuum method
Mellema, J.; Willemse, M.W.M.
1983-01-01
The problem of the determination of the effective viscosity of disperse systems (emulsions, suspensions) is considered. On the basis of the formal solution of the equations governing creeping flow in a statistically homogeneous dispersion, the effective viscosity is expressed in a series expansion
Grassmann methods in lattice field theory and statistical mechanics
International Nuclear Information System (INIS)
Bilgici, E.; Gattringer, C.; Huber, P.
2006-01-01
Full text: In two dimensions models of loops can be represented as simple Grassmann integrals. In our work we explore the generalization of these techniques to lattice field theories and statistical mechanic systems in three and four dimensions. We discuss possible strategies and applications for representations of loop and surface models as Grassmann integrals. (author)
Critical Realism and Statistical Methods--A Response to Nash
Scott, David
2007-01-01
This article offers a defence of critical realism in the face of objections Nash (2005) makes to it in a recent edition of this journal. It is argued that critical and scientific realisms are closely related and that both are opposed to statistical positivism. However, the suggestion is made that scientific realism retains (from statistical…
Classification of methods for measuring current-voltage characteristics of semiconductor devices
Directory of Open Access Journals (Sweden)
Iermolenko Ia. O.
2014-06-01
Full Text Available It is shown that computer systems for measuring current-voltage characteristics are very important for semiconductor devices production. The main criteria of efficiency of such systems are defined. It is shown that efficiency of such systems significantly depends on the methods for measuring current-voltage characteristics of semiconductor devices. The aim of this work is to analyze existing methods for measuring current-voltage characteristics of semiconductor devices and to create the classification of these methods in order to specify the most effective solutions in terms of defined criteria. To achieve this aim, the most common classifications of methods for measuring current-voltage characteristics of semiconductor devices and their main disadvantages are considered. Automated and manual, continuous, pulse, mixed, isothermal and isodynamic methods for measuring current-voltage characteristics are analyzed. As a result of the analysis and generalization of existing methods the next classification criteria are defined: the level of automation, the form of measurement signals, the condition of semiconductor device during the measurements, and the use of mathematical processing of the measurement results. With the use of these criteria the classification scheme of methods for measuring current-voltage characteristics of semiconductor devices is composed and the most effective methods are specified.
Local coding based matching kernel method for image classification.
Directory of Open Access Journals (Sweden)
Yan Song
Full Text Available This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.
Classification of high resolution satellite images
Karlsson, Anders
2003-01-01
In this thesis the Support Vector Machine (SVM)is applied on classification of high resolution satellite images. Sveral different measures for classification, including texture mesasures, 1st order statistics, and simple contextual information were evaluated. Additionnally, the image was segmented, using an enhanced watershed method, in order to improve the classification accuracy.
A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark
Directory of Open Access Journals (Sweden)
Yong Wang
2016-02-01
Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.
A Novel Classification Method for Syndrome Differentiation of Patients with AIDS
Directory of Open Access Journals (Sweden)
Yufeng Zhao
2015-01-01
Full Text Available We consider the analysis of an AIDS dataset where each patient is characterized by a list of symptoms and is labeled with one or more TCM syndromes. The task is to build a classifier that maps symptoms to TCM syndromes. We use the minimum reference set-based multiple instance learning (MRS-MIL method. The method identifies a list of representative symptoms for each syndrome and builds a Gaussian mixture model based on them. The models for all syndromes are then used for classification via Bayes rule. By relying on a subset of key symptoms for classification, MRS-MIL can produce reliable and high quality classification rules even on datasets with small sample size. On the AIDS dataset, it achieves average precision and recall 0.7736 and 0.7111, respectively. Those are superior to results achieved by alternative methods.
Cooper, Rachel
2015-06-01
The latest edition of the Diagnostic and Statistical Manual of Mental Disorders, the D.S.M.-5, was published in May 2013. In the lead up to publication, radical changes to the classification were anticipated; there was widespread dissatisfaction with the previous edition and it was accepted that a "paradigm shift" might be required. In the end, however, and despite huge efforts at revision, the published D.S.M.-5 differs far less than originally envisaged from its predecessor. This paper considers why it is that revising the D.S.M. has become so difficult. The D.S.M. is such an important classification that this question is worth asking in its own right. The case of the D.S.M. can also serve as a study for considering stasis in classification more broadly; why and how can classifications become resistant to change? I suggest that classifications like the D.S.M. can be thought of as forming part of the infrastructure of science, and have much in common with material infrastructure. In particular, as with material technologies, it is possible for "path dependent" development to cause a sub-optimal classification to become "locked in" and hard to replace. Copyright © 2015 Elsevier Ltd. All rights reserved.
Breast cancer tumor classification using LASSO method selection approach
International Nuclear Information System (INIS)
Celaya P, J. M.; Ortiz M, J. A.; Martinez B, M. R.; Solis S, L. O.; Castaneda M, R.; Garza V, I.; Martinez F, M.; Ortiz R, J. M.
2016-10-01
Breast cancer is one of the leading causes of deaths worldwide among women. Early tumor detection is key in reducing breast cancer deaths and screening mammography is the widest available method for early detection. Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. In an attempt to alleviate radiological workload, this work presents a computer-aided diagnosis (CAD x) method aimed to automatically classify tumor lesions into malign or benign as a means to a second opinion. The CAD x methos, extracts image features, and classifies the screening mammogram abnormality into one of two categories: subject at risk of having malignant tumor (malign), and healthy subject (benign). In this study, 143 abnormal segmentation s (57 malign and 86 benign) from the Breast Cancer Digital Repository (BCD R) public database were used to train and evaluate the CAD x system. Percentile-rank (p-rank) was used to standardize the data. Using the LASSO feature selection methodology, the model achieved a Leave-one-out-cross-validation area under the receiver operating characteristic curve (Auc) of 0.950. The proposed method has the potential to rank abnormal lesions with high probability of malignant findings aiding in the detection of potential malign cases as a second opinion to the radiologist. (Author)
Breast cancer tumor classification using LASSO method selection approach
Energy Technology Data Exchange (ETDEWEB)
Celaya P, J. M.; Ortiz M, J. A.; Martinez B, M. R.; Solis S, L. O.; Castaneda M, R.; Garza V, I.; Martinez F, M.; Ortiz R, J. M., E-mail: morvymm@yahoo.com.mx [Universidad Autonoma de Zacatecas, Av. Ramon Lopez Velarde 801, Col. Centro, 98000 Zacatecas, Zac. (Mexico)
2016-10-15
Breast cancer is one of the leading causes of deaths worldwide among women. Early tumor detection is key in reducing breast cancer deaths and screening mammography is the widest available method for early detection. Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. In an attempt to alleviate radiological workload, this work presents a computer-aided diagnosis (CAD x) method aimed to automatically classify tumor lesions into malign or benign as a means to a second opinion. The CAD x methos, extracts image features, and classifies the screening mammogram abnormality into one of two categories: subject at risk of having malignant tumor (malign), and healthy subject (benign). In this study, 143 abnormal segmentation s (57 malign and 86 benign) from the Breast Cancer Digital Repository (BCD R) public database were used to train and evaluate the CAD x system. Percentile-rank (p-rank) was used to standardize the data. Using the LASSO feature selection methodology, the model achieved a Leave-one-out-cross-validation area under the receiver operating characteristic curve (Auc) of 0.950. The proposed method has the potential to rank abnormal lesions with high probability of malignant findings aiding in the detection of potential malign cases as a second opinion to the radiologist. (Author)
Comparison of four statistical and machine learning methods for crash severity prediction.
Iranitalab, Amirfarrokh; Khattak, Aemal
2017-11-01
Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical methods for data analysis in particle physics
Lista, Luca
2017-01-01
This concise set of course-based notes provides the reader with the main concepts and tools needed to perform statistical analyses of experimental data, in particular in the field of high-energy physics (HEP). First, the book provides an introduction to probability theory and basic statistics, mainly intended as a refresher from readers’ advanced undergraduate studies, but also to help them clearly distinguish between the Frequentist and Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on both discoveries and upper limits, as many applications in HEP concern hypothesis testing, where the main goal is often to provide better and better limits so as to eventually be able to distinguish between competing hypotheses, or to rule out some of them altogether. Many worked-out examples will help newcomers to the field and graduate students alike understand the pitfalls involved in applying theoretical co...
Reactor noise analysis by statistical pattern recognition methods
International Nuclear Information System (INIS)
Howington, L.C.; Gonzalez, R.C.
1976-01-01
A multivariate statistical pattern recognition system for reactor noise analysis is presented. The basis of the system is a transformation for decoupling correlated variables and algorithms for inferring probability density functions. The system is adaptable to a variety of statistical properties of the data, and it has learning, tracking, updating, and data compacting capabilities. System design emphasizes control of the false-alarm rate. Its abilities to learn normal patterns, to recognize deviations from these patterns, and to reduce the dimensionality of data with minimum error were evaluated by experiments at the Oak Ridge National Laboratory (ORNL) High-Flux Isotope Reactor. Power perturbations of less than 0.1 percent of the mean value in selected frequency ranges were detected by the pattern recognition system
Statistical methods for data analysis in particle physics
AUTHOR|(CDS)2070643
2015-01-01
This concise set of course-based notes provides the reader with the main concepts and tools to perform statistical analysis of experimental data, in particular in the field of high-energy physics (HEP). First, an introduction to probability theory and basic statistics is given, mainly as reminder from advanced undergraduate studies, yet also in view to clearly distinguish the Frequentist versus Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on upper limits as many applications in HEP concern hypothesis testing, where often the main goal is to provide better and better limits so as to be able to distinguish eventually between competing hypotheses or to rule out some of them altogether. Many worked examples will help newcomers to the field and graduate students to understand the pitfalls in applying theoretical concepts to actual data
A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem
Zekić-Sušac, Marijana; Pfeifer, Sanja; Šarlija, Nataša
2014-01-01
Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART ...
New KF-PP-SVM classification method for EEG in brain-computer interfaces.
Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian
2014-01-01
Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
Wu, Shang-Lin; Liao, Lun-De; Lu, Shao-Wei; Jiang, Wei-Ling; Chen, Shi-An; Lin, Chin-Teng
2013-08-01
Electrooculography (EOG) signals can be used to control human-computer interface (HCI) systems, if properly classified. The ability to measure and process these signals may help HCI users to overcome many of the physical limitations and inconveniences in daily life. However, there are currently no effective multidirectional classification methods for monitoring eye movements. Here, we describe a classification method used in a wireless EOG-based HCI device for detecting eye movements in eight directions. This device includes wireless EOG signal acquisition components, wet electrodes and an EOG signal classification algorithm. The EOG classification algorithm is based on extracting features from the electrical signals corresponding to eight directions of eye movement (up, down, left, right, up-left, down-left, up-right, and down-right) and blinking. The recognition and processing of these eight different features were achieved in real-life conditions, demonstrating that this device can reliably measure the features of EOG signals. This system and its classification procedure provide an effective method for identifying eye movements. Additionally, it may be applied to study eye functions in real-life conditions in the near future.
D'Isanto, A.; Cavuoti, S.; Brescia, M.; Donalek, C.; Longo, G.; Riccio, G.; Djorgovski, S. G.
2016-04-01
The exploitation of present and future synoptic (multiband and multi-epoch) surveys requires an extensive use of automatic methods for data processing and data interpretation. In this work, using data extracted from the Catalina Real Time Transient Survey (CRTS), we investigate the classification performance of some well tested methods: Random Forest, MultiLayer Perceptron with Quasi Newton Algorithm and K-Nearest Neighbours, paying special attention to the feature selection phase. In order to do so, several classification experiments were performed. Namely: identification of cataclysmic variables, separation between galactic and extragalactic objects and identification of supernovae.
Identification and classification of spine vertebrae by automated methods
Long, L. Rodney; Thoma, George R.
2001-07-01
We are currently working toward developing computer-assisted methods for the indexing of a collection of 17,000 digitized x-ray images by biomedical content. These images were collected as part of a nationwide health survey and form a research resource for osteoarthitis and bone morphometry. This task requires the development of algorithms to robustly analyze the x-ray contents for key landmarks, to segment the vertebral bodies, to accurately measure geometric features of the individual vertebrae and inter-vertebral areas, and to classify the spine anatomy into normal or abnormal classes for conditions of interest, including anterior osteophytes and disc space narrowing. Subtasks of this work have been created and divided among collaborators. In this paper, we provide a technical description of the overall task, report on progress made by collaborators, and provide the most recent results of our own research into obtaining first-order location of the spine region of interest by automated methods. We are currently concentrating on images of the cervical spine, but will expand the work to include the lumbar spine as well. Development of successful image processing techniques for computer-assisted indexing of medical image collections is expected to have a significant impact within the medical research and patient care systems.
A minimum spanning forest based classification method for dedicated breast CT images
International Nuclear Information System (INIS)
Pike, Robert; Sechopoulos, Ioannis; Fei, Baowei
2015-01-01
Purpose: To develop and test an automated algorithm to classify different types of tissue in dedicated breast CT images. Methods: Images of a single breast of five different patients were acquired with a dedicated breast CT clinical prototype. The breast CT images were processed by a multiscale bilateral filter to reduce noise while keeping edge information and were corrected to overcome cupping artifacts. As skin and glandular tissue have similar CT values on breast CT images, morphologic processing is used to identify the skin based on its position information. A support vector machine (SVM) is trained and the resulting model used to create a pixelwise classification map of fat and glandular tissue. By combining the results of the skin mask with the SVM results, the breast tissue is classified as skin, fat, and glandular tissue. This map is then used to identify markers for a minimum spanning forest that is grown to segment the image using spatial and intensity information. To evaluate the authors’ classification method, they use DICE overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on five patient images. Results: Comparison between the automatic and the manual segmentation shows that the minimum spanning forest based classification method was able to successfully classify dedicated breast CT image with average DICE ratios of 96.9%, 89.8%, and 89.5% for fat, glandular, and skin tissue, respectively. Conclusions: A 2D minimum spanning forest based classification method was proposed and evaluated for classifying the fat, skin, and glandular tissue in dedicated breast CT images. The classification method can be used for dense breast tissue quantification, radiation dose assessment, and other applications in breast imaging
Haaf, Ezra; Barthel, Roland
2018-04-01
Classification and similarity based methods, which have recently received major attention in the field of surface water hydrology, namely through the PUB (prediction in ungauged basins) initiative, have not yet been applied to groundwater systems. However, it can be hypothesised, that the principle of "similar systems responding similarly to similar forcing" applies in subsurface hydrology as well. One fundamental prerequisite to test this hypothesis and eventually to apply the principle to make "predictions for ungauged groundwater systems" is efficient methods to quantify the similarity of groundwater system responses, i.e. groundwater hydrographs. In this study, a large, spatially extensive, as well as geologically and geomorphologically diverse dataset from Southern Germany and Western Austria was used, to test and compare a set of 32 grouping methods, which have previously only been used individually in local-scale studies. The resulting groupings are compared to a heuristic visual classification, which serves as a baseline. A performance ranking of these classification methods is carried out and differences in homogeneity of grouping results were shown, whereby selected groups were related to hydrogeological indices and geological descriptors. This exploratory empirical study shows that the choice of grouping method has a large impact on the object distribution within groups, as well as on the homogeneity of patterns captured in groups. The study provides a comprehensive overview of a large number of grouping methods, which can guide researchers when attempting similarity-based groundwater hydrograph classification.
Improving Land Use/Land Cover Classification by Integrating Pixel Unmixing and Decision Tree Methods
Directory of Open Access Journals (Sweden)
Chao Yang
2017-11-01
Full Text Available Decision tree classification is one of the most efficient methods for obtaining land use/land cover (LULC information from remotely sensed imageries. However, traditional decision tree classification methods cannot effectively eliminate the influence of mixed pixels. This study aimed to integrate pixel unmixing and decision tree to improve LULC classification by removing mixed pixel influence. The abundance and minimum noise fraction (MNF results that were obtained from mixed pixel decomposition were added to decision tree multi-features using a three-dimensional (3D Terrain model, which was created using an image fusion digital elevation model (DEM, to select training samples (ROIs, and improve ROI separability. A Landsat-8 OLI image of the Yunlong Reservoir Basin in Kunming was used to test this proposed method. Study results showed that the Kappa coefficient and the overall accuracy of integrated pixel unmixing and decision tree method increased by 0.093% and 10%, respectively, as compared with the original decision tree method. This proposed method could effectively eliminate the influence of mixed pixels and improve the accuracy in complex LULC classifications.
Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan
2015-10-21
The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine.
Wisaijohn, Thunthita; Pimkhaokham, Atiphan; Lapying, Phenkhae; Itthichaisri, Chumpot; Pannarunothai, Supasit; Igarashi, Isao; Kawabuchi, Koichi
2010-01-01
This study aimed to develop a new casemix classification system as an alternative method for the budget allocation of oral healthcare service (OHCS). Initially, the International Statistical of Diseases and Related Health Problem, 10th revision, Thai Modification (ICD-10-TM) related to OHCS was used for developing the software "Grouper". This model was designed to allow the translation of dental procedures into eight-digit codes. Multiple regression analysis was used to analyze the relationship between the factors used for developing the model and the resource consumption. Furthermore, the coefficient of variance, reduction in variance, and relative weight (RW) were applied to test the validity. The results demonstrated that 1,624 OHCS classifications, according to the diagnoses and the procedures performed, showed high homogeneity within groups and heterogeneity between groups. Moreover, the RW of the OHCS could be used to predict and control the production costs. In conclusion, this new OHCS casemix classification has a potential use in a global decision making.
Directory of Open Access Journals (Sweden)
Thunthita Wisaijohn
2010-01-01
Full Text Available This study aimed to develop a new casemix classification system as an alternative method for the budget allocation of oral healthcare service (OHCS. Initially, the International Statistical of Diseases and Related Health Problem, 10th revision, Thai Modification (ICD-10-TM related to OHCS was used for developing the software “Grouper”. This model was designed to allow the translation of dental procedures into eight-digit codes. Multiple regression analysis was used to analyze the relationship between the factors used for developing the model and the resource consumption. Furthermore, the coefficient of variance, reduction in variance, and relative weight (RW were applied to test the validity. The results demonstrated that 1,624 OHCS classifications, according to the diagnoses and the procedures performed, showed high homogeneity within groups and heterogeneity between groups. Moreover, the RW of the OHCS could be used to predict and control the production costs. In conclusion, this new OHCS casemix classification has a potential use in a global decision making.
International Nuclear Information System (INIS)
Saedtler, E.
1981-01-01
The method for controlling the vibrating behaviour of primary circuit components or for a general systems control is a combination of methods of the statistic systems theory, optimum filter theory, statistic decision theory and of the pattern recognition method. It is appropriate for automatic control of complex systems and stochastic events. (DG) [de
Holzhütter, H G; Genschow, E; Diener, W; Schlede, E
2003-05-01
The acute toxic class (ATC) methods were developed for determining LD(50)/LC(50) estimates of chemical substances with significantly fewer animals than needed when applying conventional LD(50)/LC(50) tests. The ATC methods are sequential stepwise procedures with fixed starting doses/concentrations and a maximum of six animals used per dose/concentration. The numbers of dead/moribund animals determine whether further testing is necessary or whether the test is terminated. In recent years we have developed classification procedures for the oral, dermal and inhalation routes of administration by using biometric methods. The biometric approach assumes a probit model for the mortality probability of a single animal and assigns the chemical to that toxicity class for which the best concordance is achieved between the statistically expected and the observed numbers of dead/moribund animals at the various steps of the test procedure. In previous publications we have demonstrated the validity of the biometric ATC methods on the basis of data obtained for the oral ATC method in two-animal ring studies with 15 participants from six countries. Although the test procedures and biometric evaluations for the dermal and inhalation ATC methods have already been published, there was a need for an adaptation of the classification schemes to the starting doses/concentrations of the Globally Harmonized Classification System (GHS) recently adopted by the Organization for Economic Co-operation and Development (OECD). Here we present the biometric evaluation of the dermal and inhalation ATC methods for the starting doses/concentrations of the GHS and of some other international classification systems still in use. We have developed new test procedures and decision rules for the dermal and inhalation ATC methods, which require significantly fewer animals to provide predictions of toxicity classes, that are equally good or even better than those achieved by using the conventional LD(50)/LC
Spatial Analysis Along Networks Statistical and Computational Methods
Okabe, Atsuyuki
2012-01-01
In the real world, there are numerous and various events that occur on and alongside networks, including the occurrence of traffic accidents on highways, the location of stores alongside roads, the incidence of crime on streets and the contamination along rivers. In order to carry out analyses of those events, the researcher needs to be familiar with a range of specific techniques. Spatial Analysis Along Networks provides a practical guide to the necessary statistical techniques and their computational implementation. Each chapter illustrates a specific technique, from Stochastic Point Process
Method of statistical estimation of temperature minimums in binary systems
International Nuclear Information System (INIS)
Mireev, V.A.; Safonov, V.V.
1985-01-01
On the basis of statistical processing of literature data the technique for evaluation of temperature minima on liquidus curves in binary systems with common ion chloride systems being taken as an example, is developed. The systems are formed by 48 chlorides of 45 chemical elements including alkali, alkaline earth, rare earth and transition metals as well as Cd, In, Th. It is shown that calculation error in determining minimum melting points depends on topology of the phase diagram. The comparison of calculated and experimental data for several previously nonstudied systems is given
HEART ABNORMALITY CLASSIFICATIONS USING FOURIER TRANSFORMS METHOD AND NEURAL NETWORKS
Directory of Open Access Journals (Sweden)
Endah Purwanti
2014-05-01
Full Text Available Health problems with cardiovascular system disorder are still ranked high globally. One way to detect abnormalities in the cardiovascular system especially in the heart is through the electrocardiogram (ECG reading. However, reading ECG recording needs experience and expertise, software-based neural networks has designed to help identify any abnormalities ofthe heart through electrocardiogram digital image. This image is processed using image processing methods to obtain ordinate chart which representing the heart’s electrical potential. Feature extraction using Fourier transforms which are divided into several numbers of coefficients. As the software input, Fourier transforms coefficient have been normalized. Output of this software is divided into three classes, namely heart with atrial fibrillation, coronary heart disease and normal. Maximum accuracy rate ofthis software is 95.45%, with the distribution of the Fourier transform coefficients 1/8 and number of nodes 5, while minimum accuracy rate of this software at least 68.18% by distribution of the Fourier transform coefficients 1/32 and the number of nodes 32. Overall result accuracy rate of this software has an average of86.05% and standard deviation of7.82.
Han, Kyunghwa; Jung, Inkyung
2018-05-01
This review article presents an assessment of trends in statistical methods and an evaluation of their appropriateness in articles published in the Archives of Plastic Surgery (APS) from 2012 to 2017. We reviewed 388 original articles published in APS between 2012 and 2017. We categorized the articles that used statistical methods according to the type of statistical method, the number of statistical methods, and the type of statistical software used. We checked whether there were errors in the description of statistical methods and results. A total of 230 articles (59.3%) published in APS between 2012 and 2017 used one or more statistical method. Within these articles, there were 261 applications of statistical methods with continuous or ordinal outcomes, and 139 applications of statistical methods with categorical outcome. The Pearson chi-square test (17.4%) and the Mann-Whitney U test (14.4%) were the most frequently used methods. Errors in describing statistical methods and results were found in 133 of the 230 articles (57.8%). Inadequate description of P-values was the most common error (39.1%). Among the 230 articles that used statistical methods, 71.7% provided details about the statistical software programs used for the analyses. SPSS was predominantly used in the articles that presented statistical analyses. We found that the use of statistical methods in APS has increased over the last 6 years. It seems that researchers have been paying more attention to the proper use of statistics in recent years. It is expected that these positive trends will continue in APS.
Statistical method for quality control in presence of measurement errors
International Nuclear Information System (INIS)
Lauer-Peccoud, M.R.
1998-01-01
In a quality inspection of a set of items where the measurements of values of a quality characteristic of the item are contaminated by random errors, one can take wrong decisions which are damageable to the quality. So of is important to control the risks in such a way that a final quality level is insured. We consider that an item is defective or not if the value G of its quality characteristic is larger or smaller than a given level g. We assume that, due to the lack of precision of the measurement instrument, the measurement M of this characteristic is expressed by ∫ (G) + ξ where f is an increasing function such that the value ∫ (g 0 ) is known and ξ is a random error with mean zero and given variance. First we study the problem of the determination of a critical measure m such that a specified quality target is reached after the classification of a lot of items where each item is accepted or rejected depending on whether its measurement is smaller or greater than m. Then we analyse the problem of testing the global quality of a lot from the measurements for a example of items taken from the lot. For these two kinds of problems and for different quality targets, we propose solutions emphasizing on the case where the function ∫ is linear and the error ξ and the variable G are Gaussian. Simulation results allow to appreciate the efficiency of the different considered control procedures and their robustness with respect to deviations from the assumptions used in the theoretical derivations. (author)
Mapping US Urban Extents from MODIS Data Using One-Class Classification Method
Directory of Open Access Journals (Sweden)
Bo Wan
2015-08-01
Full Text Available Urban areas are one of the most important components of human society. Their extents have been continuously growing during the last few decades. Accurate and timely measurements of the extents of urban areas can help in analyzing population densities and urban sprawls and in studying environmental issues related to urbanization. Urban extents detected from remotely sensed data are usually a by-product of land use classification results, and their interpretation requires a full understanding of land cover types. In this study, for the first time, we mapped urban extents in the continental United States using a novel one-class classification method, i.e., positive and unlabeled learning (PUL, with multi-temporal Moderate Resolution Imaging Spectroradiometer (MODIS data for the year 2010. The Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS night stable light data were used to calibrate the urban extents obtained from the one-class classification scheme. Our results demonstrated the effectiveness of the use of the PUL algorithm in mapping large-scale urban areas from coarse remote-sensing images, for the first time. The total accuracy of mapped urban areas was 92.9% and the kappa coefficient was 0.85. The use of DMSP-OLS night stable light data can significantly reduce false detection rates from bare land and cropland far from cities. Compared with traditional supervised classification methods, the one-class classification scheme can greatly reduce the effort involved in collecting training datasets, without losing predictive accuracy.
Directory of Open Access Journals (Sweden)
Zaira M Alieva
2016-01-01
Full Text Available The article analyzes the application of mathematical and statistical methods in the analysis of socio-humanistic texts. The essence of mathematical and statistical methods, presents examples of their use in the study of Humanities and social phenomena. Considers the key issues faced by the expert in the application of mathematical-statistical methods in socio-humanitarian sphere, including the availability of sustainable contrasting socio-humanitarian Sciences and mathematics; the complexity of the allocation of the object that is the bearer of the problem; having the use of a probabilistic approach. The conclusion according to the results of the study.
Directory of Open Access Journals (Sweden)
S. H Sanaeinejad
2012-12-01
Full Text Available Soil salinity is an important factor that affects plant growth and reduces production of plantat different growth stages Remote sensing technology and GIS have a great potential for monitoring dynamic soil processes such as salinity. In the present study the efficiency of remote sensing technology and its integration with GIS was examined to estimate soil salinity for Neyshabour basin. Different classification methods for soil salinity were also investigated. We used 6 bands of LandSat ETM+ for this study. Classification results obtained from applying mathematical models for the images were compared with different band combinations results. The area of saline and non saline soil classes were identified in the study area based on the both methods and also based on the combination of the two methods. The results showed that the best method for soil classification was using of the two methods in the first stage to separate two classes of saline and non saline soils and then classifying the non saline soils in the second stage. As the variation in the numerical values of the image for different soil salinity in the study area was small, it was concluded that there is a limit potential of LandSat ETM+ images for identifying and classification of soil salinity in such an area.
Vyas, Renu; Bapat, Sanket; Jain, Esha; Tambe, Sanjeev S; Karthikeyan, Muthukumarasamy; Kulkarni, Bhaskar D
2015-01-01
The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX and NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network (ANN), k nearest neighbor (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree--(DT) and Random Forest--(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82%) and toxicophoric (70.64%). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic were studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design.
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
A statistical comparison of accelerated concrete testing methods
Denny Meyer
1997-01-01
Accelerated curing results, obtained after only 24 hours, are used to predict the 28 day strength of concrete. Various accelerated curing methods are available. Two of these methods are compared in relation to the accuracy of their predictions and the stability of the relationship between their 24 hour and 28 day concrete strength. The results suggest that Warm Water accelerated curing is preferable to Hot Water accelerated curing of concrete. In addition, some other methods for improving the...
Evaluation of local corrosion life by statistical method
International Nuclear Information System (INIS)
Kato, Shunji; Kurosawa, Tatsuo; Takaku, Hiroshi; Kusanagi, Hideo; Hirano, Hideo; Kimura, Hideo; Hide, Koichiro; Kawasaki, Masayuki
1987-01-01
In this paper, for the purpose of achievement of life extension of light water reactor, we examined the evaluation of local corrosion by satistical method and its application of nuclear power plant components. There are many evaluation examples of maximum cracking depth of local corrosion by dowbly exponential distribution. This evaluation method has been established. But, it has not been established that we evaluate service lifes of construction materials by satistical method. In order to establish of service life evaluation by satistical method, we must strive to collect local corrosion dates and its analytical researchs. (author)
Statistical methods for analysing responses of wildlife to human disturbance.
Haiganoush K. Preisler; Alan A. Ager; Michael J. Wisdom
2006-01-01
1. Off-road recreation is increasing rapidly in many areas of the world, and effects on wildlife can be highly detrimental. Consequently, we have developed methods for studying wildlife responses to off-road recreation with the use of new technologies that allow frequent and accurate monitoring of human-wildlife interactions. To illustrate these methods, we studied the...
Introducing Students to the Application of Statistics and Investigative Methods in Political Science
Wells, Dominic D.; Nemire, Nathan A.
2017-01-01
This exercise introduces students to the application of statistics and its investigative methods in political science. It helps students gain a better understanding and a greater appreciation of statistics through a real world application.
Directory of Open Access Journals (Sweden)
Záhorská Renáta
2016-12-01
Full Text Available This paper presents the results of the waste management research in a selected engineering company RIBE Slovakia, k. s., Nitra factory. Within of its manufacturing programme, the mentioned factory uses wide range of the manufacturing technologies (cutting operations, metal cold-forming, thread rolling, metal surface finishing, automatic sorting, metrology, assembly, with the aim to produce the final products – connecting components (fasteners delivered to many industrial fields (agricultural machinery manufacturers, car industry, etc.. There were obtained data characterizing production technologies and the range of manufactured products. The key attention is paid to the classification of waste produced by engineering production and to waste management within the company. Within the research, there were obtained data characterizing the time course of production of various waste types and these data were evaluated by means of statistical method using STATGRAPHICS. Based on the application of SWOT analysis, there is objectively assessed the waste management in the company in terms of strengths and weaknesses, as well as determination of the opportunities and potential threats. Results obtained by the SWOT analysis application have allowed to come to conclusion that the company RIBE Slovakia, k. s., Nitra factory has well organized waste management system. The fact that the waste management system is incorporated into the company management system can be considered as an advantage.
Use of Mathematical Methods of Statistics for Analyzing Engine Characteristics
Directory of Open Access Journals (Sweden)
Aivaras Jasilionis
2012-11-01
Full Text Available For the development of new models, automobile manufacturers are trying to come up with optimal software for engine control in all movement modes. However, in this case, a vehicle cannot reach outstanding characteristics in none of them. This is the main reason why modifications in engine control software used for adapting the vehicle for driver’s needs are becoming more and more popular. The article presents a short analysis of development trends towards engine control software. Also, models of mathematical statistics for engine power and torque growth are created. The introduced models give an opportunity to predict the probabilities of engine power or torque growth after individual reprogramming of engine control software.
Statistical Methods and Tools for Hanford Staged Feed Tank Sampling
Energy Technology Data Exchange (ETDEWEB)
Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
2013-10-01
This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).
A new quantum statistical evaluation method for time correlation functions
International Nuclear Information System (INIS)
Loss, D.; Schoeller, H.
1989-01-01
Considering a system of N identical interacting particles, which obey Fermi-Dirac or Bose-Einstein statistics, the authors derive new formulas for correlation functions of the type C(t) = i= 1 N A i (t) Σ j=1 N B j > (where B j is diagonal in the free-particle states) in the thermodynamic limit. Thereby they apply and extend a superoperator formalism, recently developed for the derivation of long-time tails in semiclassical systems. As an illustrative application, the Boltzmann equation value of the time-integrated correlation function C(t) is derived in a straight-forward manner. Due to exchange effects, the obtained t-matrix and the resulting scattering cross section, which occurs in the Boltzmann collision operator, are now functionals of the Fermi-Dirac or Bose-Einstein distribution
Using machine learning, neural networks and statistics to predict bankruptcy
Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.
1997-01-01
Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear
Statistical Bayesian method for reliability evaluation based on ADT data
Lu, Dawei; Wang, Lizhi; Sun, Yusheng; Wang, Xiaohong
2018-05-01
Accelerated degradation testing (ADT) is frequently conducted in the laboratory to predict the products’ reliability under normal operating conditions. Two kinds of methods, degradation path models and stochastic process models, are utilized to analyze degradation data and the latter one is the most popular method. However, some limitations like imprecise solution process and estimation result of degradation ratio still exist, which may affect the accuracy of the acceleration model and the extrapolation value. Moreover, the conducted solution of this problem, Bayesian method, lose key information when unifying the degradation data. In this paper, a new data processing and parameter inference method based on Bayesian method is proposed to handle degradation data and solve the problems above. First, Wiener process and acceleration model is chosen; Second, the initial values of degradation model and parameters of prior and posterior distribution under each level is calculated with updating and iteration of estimation values; Third, the lifetime and reliability values are estimated on the basis of the estimation parameters; Finally, a case study is provided to demonstrate the validity of the proposed method. The results illustrate that the proposed method is quite effective and accuracy in estimating the lifetime and reliability of a product.
2016-02-01
NUMBER OF PAGES 24 19a. NAME OF RESPONSIBLE PERSON Kelly Dickerson a. REPORT Unclassified b. ABSTRACT Unclassified c . THIS...ARL-TR-7960 ● FEB 2016 US Army Research Laboratory Two Classification Methods for Grouping Common Environmental Sounds in Terms...of Perceived Pleasantness by Kelly Dickerson, Brandon S Perelman, Laura Sherry, and Jeremy R Gaston Approved for public
LENUS (Irish Health Repository)
Rossen, Janne
2017-01-01
Internationally, the 10-Group Classification System (TGCS) has been used to report caesarean section rates, but analysis of other outcomes is also recommended. We now aim to present the TGCS as a method to assess outcomes of labour and delivery using routine collection of perinatal information.
Energy Technology Data Exchange (ETDEWEB)
Waldt, Simone; Eiber, Matthias; Woertler, Klaus [Klinikum rechts der Isar der Technischen Univ. Muenchen (TUM), Muenchen (Germany). Inst. fuer Radiologie
2011-07-01
The book on measuring methods and classification in the musculoskeletal radiology covers the following topics: legs; hip joint; knee joint; foot; shoulder joint; elbow joint; wrist joint; spinal column; craniocervical transition region and cervical spine; muscular-skeletal carcinomas; osteoporosis; arthrosis; articular cartilage; hemophilia; rheumatic arthritis; muscular injuries; skeleton age.
Application of the Covalent Bond Classification Method for the Teaching of Inorganic Chemistry
Green, Malcolm L. H.; Parkin, Gerard
2014-01-01
The Covalent Bond Classification (CBC) method provides a means to classify covalent molecules according to the number and types of bonds that surround an atom of interest. This approach is based on an elementary molecular orbital analysis of the bonding involving the central atom (M), with the various interactions being classified according to the…
Liu, Boquan; Polce, Evan; Sprott, Julien C.; Jiang, Jack J.
2018-01-01
Purpose: The purpose of this study is to introduce a chaos level test to evaluate linear and nonlinear voice type classification method performances under varying signal chaos conditions without subjective impression. Study Design: Voice signals were constructed with differing degrees of noise to model signal chaos. Within each noise power, 100…
A simple statistical method for catch comparison studies
DEFF Research Database (Denmark)
Holst, René; Revill, Andrew
2009-01-01
For analysing catch comparison data, we propose a simple method based on Generalised Linear Mixed Models (GLMM) and use polynomial approximations to fit the proportions caught in the test codend. The method provides comparisons of fish catch at length by the two gears through a continuous curve...... with a realistic confidence band. We demonstrate the versatility of this method, on field data obtained from the first known testing in European waters of the Rhode Island (USA) 'Eliminator' trawl. These data are interesting as they include a range of species with different selective patterns. Crown Copyright (C...
A statistical comparison of accelerated concrete testing methods
Directory of Open Access Journals (Sweden)
Denny Meyer
1997-01-01
Full Text Available Accelerated curing results, obtained after only 24 hours, are used to predict the 28 day strength of concrete. Various accelerated curing methods are available. Two of these methods are compared in relation to the accuracy of their predictions and the stability of the relationship between their 24 hour and 28 day concrete strength. The results suggest that Warm Water accelerated curing is preferable to Hot Water accelerated curing of concrete. In addition, some other methods for improving the accuracy of predictions of 28 day strengths are suggested. In particular the frequency at which it is necessary to recalibrate the prediction equation is considered.
Yuldashev, M. N.; Vlasov, A. I.; Novikov, A. N.
2018-05-01
This paper focuses on the development of an energy-efficient algorithm for classification of states of a wireless sensor network using machine learning methods. The proposed algorithm reduces energy consumption by: 1) elimination of monitoring of parameters that do not affect the state of the sensor network, 2) reduction of communication sessions over the network (the data are transmitted only if their values can affect the state of the sensor network). The studies of the proposed algorithm have shown that at classification accuracy close to 100%, the number of communication sessions can be reduced by 80%.
Karpenka, N. V.; Feroz, F.; Hobson, M. P.
2013-02-01
A method is presented for automated photometric classification of supernovae (SNe) as Type Ia or non-Ia. A two-step approach is adopted in which (i) the SN light curve flux measurements in each observing filter are fitted separately to an analytical parametrized function that is sufficiently flexible to accommodate virtually all types of SNe and (ii) the fitted function parameters and their associated uncertainties, along with the number of flux measurements, the maximum-likelihood value of the fit and Bayesian evidence for the model, are used as the input feature vector to a classification neural network that outputs the probability that the SN under consideration is of Type Ia. The method is trained and tested using data released following the Supernova Photometric Classification Challenge (SNPCC), consisting of light curves for 20 895 SNe in total. We consider several random divisions of the data into training and testing sets: for instance, for our sample D_1 (D_4), a total of 10 (40) per cent of the data are involved in training the algorithm and the remainder used for blind testing of the resulting classifier; we make no selection cuts. Assigning a canonical threshold probability of pth = 0.5 on the network output to class an SN as Type Ia, for the sample D_1 (D_4) we obtain a completeness of 0.78 (0.82), purity of 0.77 (0.82) and SNPCC figure of merit of 0.41 (0.50). Including the SN host-galaxy redshift and its uncertainty as additional inputs to the classification network results in a modest 5-10 per cent increase in these values. We find that the quality of the classification does not vary significantly with SN redshift. Moreover, our probabilistic classification method allows one to calculate the expected completeness, purity and figure of merit (or other measures of classification quality) as a function of the threshold probability pth, without knowing the true classes of the SNe in the testing sample, as is the case in the classification of real SNe
Rationalizing method of replacement intervals by using Bayesian statistics
International Nuclear Information System (INIS)
Kasai, Masao; Notoya, Junichi; Kusakari, Yoshiyuki
2007-01-01
This study represents the formulations for rationalizing the replacement intervals of equipments and/or parts taking into account the probability density functions (PDF) of the parameters of failure distribution functions (FDF) and compares the optimized intervals by our formulations with those by conventional formulations which uses only representative values of the parameters of FDF instead of using these PDFs. The failure data are generated by Monte Carlo simulations since the real failure data can not be available for us. The PDF of PDF parameters are obtained by Bayesian method and the representative values are obtained by likelihood estimation and Bayesian method. We found that the method using PDF by Bayesian method brings longer replacement intervals than one using the representative of the parameters. (author)
Comparative Analysis of Kernel Methods for Statistical Shape Learning
National Research Council Canada - National Science Library
Rathi, Yogesh; Dambreville, Samuel; Tannenbaum, Allen
2006-01-01
.... In this work, we perform a comparative analysis of shape learning techniques such as linear PCA, kernel PCA, locally linear embedding and propose a new method, kernelized locally linear embedding...